WIP: [platforms/x11] Cleanup GLX backend, revise compositing
Needs ReviewPublic

Authored by romangg on Aug 12 2019, 1:05 AM.

Details

Reviewers
fredrik
Group Reviewers
KWin
Maniphest Tasks
T11071: Rework compositing pipeline
Summary

This is an early WIP of the GLX backend. I go through that with a chainsaw
at the moment and hope for some feedback by people knowing all the history
about it and stop me before accidentally cutting off the branch we are
sitting on. ;)

Overall goals:

  • Make the code less convoluted.
  • Sync paint with vblanks if extensions required for that are available
  • Reduce delay by pushing paint to end of vblank cycle, see [1].

Currently doing:

  • Remove triple buffering. A compositor does not need this when it doesn't screw up in some other way.
  • Remove vsync option. Always sync with vblanks.
  • Replace GLX_MESA_, GLX_EXT_. and GLX_SGI_swap_control with GLX_OML_sync_control only, since this is the standard nowadays and we don't need to optimize for every legacy X device in the world.
  • Remove blocksForRetrace, instead use the Intel swap event or glFinish to sync with vblank. glFinish should block till vblank, see [2][3]. (doesn't seem to be true per se, maybe driver dependent)
  • In endRenderingFrame don't present a second time. Why was it there in the first place? -> Thanks to @fredrik's comments this is now clear: in beginRenderingFrame it's called if we don't block on retrace and if do block in endRenderingFrame and according to comment because damage gets reset in the later case it would be noop in beginRenderingFrame. Needs to get replaced though to make the code less complex.

Added 19-08-13:

  • When swap events are available don't use the fallback timer.
  • Delay timer reduces latency between paint and vsync

Added 19-08-20:

  • Move present to endRenderingFrame to have less than one frame of latency.

Added 19-08-21:

  • For now perform compositing always from event loop. Without it Wayland session crashes. Probably otherwise hidden issue in Effects.
  • Cleanup: Remove more plumbing for vsync and retrace blocking and swap-profiler

Open questions:

  • Is glFinish really blocking till vblank? X Present extension provides events to get notified about vblank, but since we channel through GLX this is not available directly to my knowledge. So it's kind of weird that there is only the Intel swap event to wrap that behavior explicitly.
  • Compositor has a compositing timer. It seems we always wait for it, but if we sync with vblank there is no reason to do it this way. Instead only use is as a fallback in case sync with vblank is not supported?

[1] https://ppaalanen.blogspot.com/2015/02/weston-repaint-scheduling.html
[2] https://www.khronos.org/opengl/wiki/Swap_Interval
[3] https://github.com/tildearrow/kwin-lowlatency#background

Test Plan

kwin_x11 --replace works with and without setting KWIN_USE_INTEL_SWAP_EVENT
to 1.

I haven't yet tested it on AMD and analyzed it with GPUvis what I want to do
next.

Diff Detail

Repository
R108 KWin
Branch
glxCleanup
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 15024
Build 15042: arc lint + arc unit
romangg created this revision.Aug 12 2019, 1:05 AM
Restricted Application added a project: KWin. · View Herald TranscriptAug 12 2019, 1:05 AM
Restricted Application added a subscriber: kwin. · View Herald Transcript
romangg requested review of this revision.Aug 12 2019, 1:05 AM
romangg edited the summary of this revision. (Show Details)Aug 12 2019, 1:06 AM
romangg updated this revision to Diff 63592.Aug 12 2019, 1:09 AM
  • Remove unused variables
romangg updated this revision to Diff 63593.Aug 12 2019, 1:13 AM
  • Restore some functions old positions to limit the diff size

I haven't yet tested it on AMD

I could test how it works on amdgpu (RX560), if you need help with that, when it's ready to test

I haven't yet tested it on AMD

I could test how it works on amdgpu (RX560), if you need help with that, when it's ready to test

Thank you, I have an AMD setup here as well. But of course more testing on other setups is always better! If you have a git checkout of KWin you can test now already (with arc do arc patch D23105 to easily apply the diff to your local checkout). It should work just fine.

What I really need additionally is testing it on an Nvidia setup.

romangg updated this revision to Diff 63608.Aug 12 2019, 2:51 PM

Rebase on master

romangg edited the summary of this revision. (Show Details)Aug 12 2019, 6:08 PM
romangg updated this revision to Diff 63666.Aug 13 2019, 3:40 PM
  • Compositor swap events without timer
  • Paint delay timer
  • For analysis add ftrace marker from D23114

Some observations:

  1. Calling glFinish or not does not seem to influence the repaint rate. I believe instead this is done implicitly by OpenGl if there is no explicit swap event
  2. The delay timer works fine when there are swap events. Without it we are too late for some reason. This must be connected with (1) somehow.
  3. Call to currentRefreshRate does not succeed in retrieving any refresh rate but returns the default rate of 60. This issue should be independent of this work.
romangg retitled this revision from WIP: [platforms/x11] Cleanup GLX backend to WIP: [platforms/x11] Cleanup GLX backend, revise compositing.Aug 13 2019, 3:51 PM
romangg edited the summary of this revision. (Show Details)

What I really need additionally is testing it on an Nvidia setup.

I've briefly tested it on my NVidia GTX 550Ti with nvidia driver. Nothing appears to be horribly broken. Anything in particular I should look out for?

What I really need additionally is testing it on an Nvidia setup.

I've briefly tested it on my NVidia GTX 550Ti with nvidia driver. Nothing appears to be horribly broken. Anything in particular I should look out for?

That's awesome! Thank you very much. There is nothing particular to look out for. Just the standard stuff: no tearing, no lag. I have sometimes a freeze directly after startup on an Intel-device on X11 (Wayland session works fine though). So needs some more work for sure, but we'll get there.

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.

NVIDIA supports:

  • GLX_EXT_swap_control
  • GLX_EXT_swap_control_tear
  • GLX_SGI_swap_control
  • GLX_SGI_video_sync
  • GLX_NV_delay_before_swap
  • GLX_NV_swap_group

Catalyst supports:

  • GLX_EXT_swap_control
  • GLX_SGI_swap_control
  • GLX_SGI_video_sync
  • GLX_NV_swap_group
  • GLX_MESA_swap_control
  • GLX_OML_swap_method

Mesa supports:

  • GLX_SGI_swap_control
  • GLX_SGI_video_sync
  • GLX_MESA_swap_control
  • GLX_OML_swap_method
  • GLX_OML_sync_control
  • GLX_INTEL_swap_event
plugins/platforms/x11/standalone/glxbackend.cpp
718

Because the contents of the backbuffer are undefined after a buffer swap without GLX_EXT_buffer_age.

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

plugins/platforms/x11/standalone/glxbackend.cpp
718

Ok and in other case the back buffer also has what's currently on the front buffer and we can paint it partly over and then swap?

Related to that do you know why we present() in prepareRenderingFrame (in DRM backend as well) and not after the actual paint in endRenderingFrame?

/home/lexx/dev/kde/kwin/plugins/platforms/x11/standalone/glxbackend.cpp: In member function ‘void KWin::GlxBackend::setVsync(bool)’:
/home/lexx/dev/kde/kwin/plugins/platforms/x11/standalone/glxbackend.cpp:252:32: warning: unused parameter ‘enable’ [-Wunused-parameter]
 void GlxBackend::setVsync(bool enable)
                           ~~~~~^~~~~~

If you have a git checkout of KWin you can test now already (with arc do arc patch D23105 to easily apply the diff to your local checkout). It should work just fine.

Im running latsest released versions of KF5/Plasma, so it turned out that I also need KWayland master because of D10747 and kscreenlocker master because AboutToQuit in D20804 .

lexx ~/dev/kde/kwin/build [arcpatch-D23105] $ KWIN_USE_INTEL_SWAP_EVENT=1 kwin_x11 --replace
OpenGL vendor string:                   X.Org
OpenGL renderer string:                 Radeon RX 560 Series (POLARIS11, DRM 3.32.0, 5.2.8-gentoo, LLVM 8.0.1)
OpenGL version string:                  4.5 (Compatibility Profile) Mesa 19.1.4
OpenGL shading language version string: 4.50
Driver:                                 RadeonSI
GPU class:                              Arctic Islands
OpenGL version:                         4.5
GLSL version:                           4.50
Mesa version:                           19.1.4
X server version:                       1.20.5
Linux kernel version:                   5.2.8
Requires strict binding:                yes
GLSL shaders:                           yes
Texture NPOT support:                   yes
Virtual Machine:                        no

GlxBackend::init true true true

With KWIN_USE_INTEL_SWAP_EVENT=1 all seems to work fine.

But with KWIN_USE_INTEL_SWAP_EVENT=0 or when unset, I see serious "lags" when dragging the window, it becomes very hard to drag window on screen, looks like some mouse movements are dropped(?) Resizing for example is still fine:

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

plugins/platforms/x11/standalone/glxbackend.cpp
718

Ok and in other case the back buffer also has what's currently on the front buffer and we can paint it partly over and then swap?

No, but the buffer age extension makes it possible to query the number of frames that have elapsed since the current back buffer was the front buffer. Based on that we can calculate how much we need to repaint to bring it up-to-date.

See void OpenGLBackend::addToDamageHistory(const QRegion &region) and QRegion OpenGLBackend::accumulatedDamageHistory(int bufferAge) const

Related to that do you know why we present() in prepareRenderingFrame (in DRM backend as well) and not after the actual paint in endRenderingFrame?

This code was written based on the idea that SwapBuffers() blocks until the swap is complete, which also means that it blocks until the next vblank.

The logic can be illustrated with the following pseudo code:

while (true) {
    SwapBuffers(); // For the previous frame
    renderTimer.start();
    paint();
    glFlush();

    // Sleep until just before the next vblank
    int timeSinceLastVBlank = renderTimer.elapsed();
    sleep(vblankInterval - timeSinceLastVBlank);
}

In practice we don't call sleep of course; instead we start the composite timer with the timeout set to just before the next vblank, and return to the event loop.

But the only driver where SwapBuffers() behaves like this is the NVIDIA driver, and its not the default behavior. It's enabled by setting __GL_MaxFramesAllowed to 1 before initializing OpenGL.

The reason the next frame is rendered immediately after the buffer swap instead of doing it as late as possible is to avoid missing the vblank when there is a sudden increase in render time. Also if we miss the vblank by one millisecond, the main thread will be blocked in SwapBuffers for 15 milliseconds, which is less than ideal.

But there's obviously a trade-off here between smoothness and latency.

By the way, you are partially duplicating work already done in this branch:
https://cgit.kde.org/kwin.git/log/?h=fredrik/swap-event-wip2

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

I want KWin to run v-synced always. I don't think it makes sense to make this configurable by the user. It's the default in our Wayland session and on X11 the compositor should always run v-synced as well. Or is there a valid use case for running it async in your opinion?

By the way, you are partially duplicating work already done in this branch:
https://cgit.kde.org/kwin.git/log/?h=fredrik/swap-event-wip2

That's good to know.

romangg added inline comments.Aug 19 2019, 8:34 PM
plugins/platforms/x11/standalone/glxbackend.cpp
718

The logic can be illustrated with the following pseudo code: [...]

Thank you very much for the comprehensive explanation. I see the logic behind it now when the SwapBuffers call is assumed to be blocking. When it's not it seems just unnecessary complicated to me. Just do the SwapBuffers directly after paint() for the next frame at some point before the upcoming vblank. Then on buffer swap event or vblank interval timer timeout repeat for the next frame.

Also if we miss the vblank by one millisecond, the main thread will be blocked in SwapBuffers for 15 milliseconds, which is less than ideal.

But only if the driver blocks on SwapBuffers and as you said no current driver behaves like this per default, right? Otherwise if we miss we "just" have a delay of one frame for the next paint result on screen (what we have with the current code on master all the time).

With KWIN_USE_INTEL_SWAP_EVENT=1 all seems to work fine.

But with KWIN_USE_INTEL_SWAP_EVENT=0 or when unset, I see serious "lags" when dragging the window, it becomes very hard to drag window on screen, looks like some mouse movements are dropped(?) Resizing for example is still fine:

Thanks for testing! Yea, I currently trying to move the present call to endRenderingFrame and I also notice lags and sometimes freezes on opening a window while using the timer. There are no problems with the swap event.

romangg added inline comments.Aug 19 2019, 9:28 PM
plugins/platforms/x11/standalone/glxbackend.cpp
718

Related to that do you know why we present() in prepareRenderingFrame (in DRM backend as well)

Correction: in DRM backend we actually present on endRenderingFrame!

romangg updated this revision to Diff 64101.Aug 20 2019, 10:58 AM
  • Rebase on master
  • Use fallback timer without swap event, move present

@alexeymin Can you test again with the update? I just accidentally had the fallback timer disabled before. Now works fine for me without swap event.

romangg updated this revision to Diff 64134.EditedAug 20 2019, 4:20 PM
  • Simplify fallback timer

This just means on damage we paint always with one frame later this damage. We could improve this by checking if the Compositor was idle for longer than one frame when compositing or not and only have the timer run when below this time.

romangg updated this revision to Diff 64138.Aug 20 2019, 5:14 PM

Rebase on master.

alexde added a subscriber: alexde.Aug 20 2019, 5:43 PM
romangg updated this revision to Diff 64147.Aug 20 2019, 6:08 PM
  • No need for setting __GL_MaxFramesAllowed
romangg added a comment.EditedAug 20 2019, 6:15 PM

I tested it now with Nvidia proprietary driver (v430, Neon unstable, GTX 950) and as expected the fallback timer is used since OML sync control and intel swap event are not available. This means there is no synchronization / latency minimization through delay timer in relation to the vblanks. There is of course the fallback timer providing a gap of ~16ms and the result looks still ok for me. Did not notice any tearing or freezes.

romangg edited the summary of this revision. (Show Details)Aug 20 2019, 6:55 PM

I always had tearing when watching fullscreen videos on my Nvidia setup, but with this patch I haven't noticed any tearing so far!

romangg updated this revision to Diff 64185.Aug 21 2019, 9:06 AM
  • Perform compositing always from event loop
  • Remove unneeded sync, retrace block plumbing
  • Remove swap profiler code
romangg edited the summary of this revision. (Show Details)Aug 21 2019, 9:09 AM

I always had tearing when watching fullscreen videos on my Nvidia setup, but with this patch I haven't noticed any tearing so far!

That's good to hear. :) Seems to go into the right direction.

I would like to get some more test data in on different setups before continuing by splitting this diff up into mergable patches.

@alexeymin Can you test again with the update? I just accidentally had the fallback timer disabled before. Now works fine for me without swap event.

Now it works fine without KWIN_USE_INTEL_SWAP_EVENT or with KWIN_USE_INTEL_SWAP_EVENT=0.

But because test plan still mentions testing with KWIN_USE_INTEL_SWAP_EVENT=1, I tested it too. Window movement is somewhat intermittent, it moves like with some small hops. I guess I should not use intel swap event on amdgpu, right? 😉

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

I want KWin to run v-synced always. I don't think it makes sense to make this configurable by the user. It's the default in our Wayland session and on X11 the compositor should always run v-synced as well. Or is there a valid use case for running it async in your opinion?

That's not my point though. My point is that if you don't set the swap interval, you leave it up to the driver, and not all drivers enable vsync by default.

@alexeymin Can you test again with the update? I just accidentally had the fallback timer disabled before. Now works fine for me without swap event.

Now it works fine without KWIN_USE_INTEL_SWAP_EVENT or with KWIN_USE_INTEL_SWAP_EVENT=0.

But because test plan still mentions testing with KWIN_USE_INTEL_SWAP_EVENT=1, I tested it too. Window movement is somewhat intermittent, it moves like with some small hops. I guess I should not use intel swap event on amdgpu, right? 😉

It should work with all Mesa drivers. The extension is just called GLX_INTEL_swap_event because Intel developed it.

plugins/platforms/x11/standalone/glxbackend.cpp
718

That's probably because the code in the GLX backend (and the timer code) was originally written for the NVIDIA driver, while the code in the DRM backend was not.

But the GLX backend also presents in endRenderingFrame() when blocksForRetrace() returns false.

That's not my point though. My point is that if you don't set the swap interval, you leave it up to the driver, and not all drivers enable vsync by default.

Ok, I understood what you said differently. But good to know, then I'll not remove the call to the extensions. Although I will thin it out: GLX_OML_sync_control for mesa and GLX_EXT_swap_control for Nvidia should be enough. Also it will be a simple call to activate vsync and no interface to switch it off again.

I will upload smaller patches now with changes factored out from this playground. First one will be removal of triple buffering functionality.

For split-out patches see here: D23504

romangg edited the summary of this revision. (Show Details)Aug 27 2019, 9:03 PM

Regarding the fallback to the composite timer when using the proprietary NVIDIA driver, while it doesn't really support anything equivalent to GLX_INTEL_swap_event, there is https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt which has the similar goal of reduced input latency. See the extension description for details, but essentially it allows the application to block until a specified time before the next swap. I'm not sure if that's exactly fits what KWin needs here, but maybe worth considering?.

Also, I'd be able to test this patch on a PRIME (hybrid graphics) system if nobody has done so yet. Given that there have been synchronization-related issues there before, it's probably worth checking.

However, the patch doesn't seem to apply cleanly to HEAD any longer. Would you be able to post a rebased version?

Regarding the fallback to the composite timer when using the proprietary NVIDIA driver, while it doesn't really support anything equivalent to GLX_INTEL_swap_event, there is https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt which has the similar goal of reduced input latency. See the extension description for details, but essentially it allows the application to block until a specified time before the next swap. I'm not sure if that's exactly fits what KWin needs here, but maybe worth considering?.

Thanks, the extension looks interesting. Although a delay timer could work as well and if we want reduce latency we need the timer for mesa drivers anyway. On the other other side this extension gives us somewhat a swap synchronized event to align the paint with. But this could be done as well with __GL_MaxFramesAllowed = 1 and D23881. Also this extension would block, so we need to do something similar as in D23881 with a separate thread. So in both cases we would need a thread, remaining question now is if there are other advantages with using this extension and not force __GL_MaxFramesAllowed = 1.

Also, I'd be able to test this patch on a PRIME (hybrid graphics) system if nobody has done so yet. Given that there have been synchronization-related issues there before, it's probably worth checking.

However, the patch doesn't seem to apply cleanly to HEAD any longer. Would you be able to post a rebased version?

All WIP ideas here have been transferred to a commit series also up for review at the moment. I just rebased it on master. You should be able to do a checkout of it from this diff: D23514
To do that directly issue arc patch D23514 in your local KWin repository.

Sorry for the slow follow-up, but for whatever it's worth everything seems to work fine on a PRIME setup, even without D23881. No tearing, no hangs when rendering QtQuick pop-ups, and subjectively, latency when dragging windows, ect., feels about the same as before. So maybe the fall-back timer on NVIDIA hardware is good enough.

Hey, no worries, I wouldn't have had enough time to commit the patchset anyway in the last two weeks since I was at XDC and more. Thanks for testing! Sounds promising. I will rebase next few days and then commit. Let's then continue work on it on master branch.