Possible solution for scheduling repaints on NVIDIA
Needs ReviewPublic

Authored by fredrik on Sep 11 2019, 5:06 PM.

Details

Reviewers
romangg
Group Reviewers
KWin
Maniphest Tasks
T11071: Rework compositing pipeline
Summary

We keep setting __GL_MaxFramesAllowed to 1, but move the blocking glXSwapBuffers() call to a separate thread and synthesize a swap event by emitting a signal when the call returns.

This way we can call glXSwapBuffers() at any point within the swap interval without blocking the main thread, and we can use the same scheduling code with all drivers.

Test Plan

Compile tested only.

Diff Detail

Repository
R108 KWin
Lint
Lint Skipped
Unit
Unit Tests Skipped
fredrik created this revision.Sep 11 2019, 5:06 PM
Restricted Application added a project: KWin. · View Herald TranscriptSep 11 2019, 5:06 PM
Restricted Application added a subscriber: kwin. · View Herald Transcript
fredrik requested review of this revision.Sep 11 2019, 5:06 PM
fredrik edited reviewers, added: romangg; removed: ngraham.Sep 11 2019, 5:27 PM

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

plugins/platforms/x11/standalone/glxbackend.cpp
343

Is this necessary? If the swap thread isn't rendering anything I don't think it actually needs a current context, right?

761

I'm not 100% sure this is sufficient to ensure all rendering is complete before the swap. It only guarantees that any commands have been submitted to the GPU, but not necessarily that they've all finished executing. Furthermore, glXSwapBuffers should cause an implicit glFlush anyway.

For instance, the glXSwapBuffers spec mentions glFinish for this purpose:

All GLX rendering contexts share the same notion of which are front buffers and which are back buffers. One consequence is that when multiple clients are rendering to the same double-buffered window, all of them should finish rendering before one of them issues the command to swap buffers. The clients are responsible for implementing this synchronization. Typically this is accomplished by executing glFinish and then using a semaphore in shared memory to rendezvous before swapping.

romangg added a comment.EditedSep 20 2019, 11:04 AM

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

@fredrik: I would be interested in this as well. This change allows to schedule buffer swaps like we do with swap events at some point before the vblank and then get an event through second thread when the thread is unblocked again i.e. when the swap has been completed, right? This should also work with my rework patches only providing a single path with an event after swap (or a fallback timer if such an event is not available on the hardware).

And what do you think of using https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt as suggested by Erik in D23105#525696 instead of blocking through the __GL_MaxFramesAllowed equals 1 setting? Your approach with the thread maps to the model "swap -> wait for event -> (delay for smaller latency ->) swap -> wait for event -> ..." used by the mesa drivers pretty well though.

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

@fredrik: I would be interested in this as well. This change allows to schedule buffer swaps like we do with swap events at some point before the vblank and then get an event through second thread when the thread is unblocked again i.e. when the swap has been completed, right? This should also work with my rework patches only providing a single path with an event after swap (or a fallback timer if such an event is not available on the hardware).

Yeah, that's the idea. This patch is meant to be integrated with those patches; it doesn't make any sense on its own.

And what do you think of using https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt as suggested by Erik in D23105#525696 instead of blocking through the __GL_MaxFramesAllowed equals 1 setting? Your approach with the thread maps to the model "swap -> wait for event -> (delay for smaller latency ->) swap -> wait for event -> ..." used by the mesa drivers pretty well though.

I actually started on a DelayBeforeSwapTimer that calls glXDelayBeforeSwapNV() in a separate thread. The idea being that Compositor would use that timer instead of QBasicTimer when the driver supports the extension. But as I was writing that code I realized that that approach would end up being more complex than simply doing the buffer swap in the other thread.

plugins/platforms/x11/standalone/glxbackend.cpp
343

Yeah, I actually just assumed that glXSwapBuffers() needs a current context, but indeed the specification doesn't say that anywhere.

761

Yeah, I was actually thinking that we may need to pass a fence to the other thread to ensure that. glXSwapBuffers() only flushes the context current to the thread where it is called, so that's why I added the explicit flush.

fredrik updated this revision to Diff 66636.Sep 22 2019, 11:47 PM

Use a fence to ensure that all rendering is complete before swapping buffers.

Concept wise, I really like it. It seems simple and well-encapsulated.

plugins/platforms/x11/standalone/glxbackend.cpp
121

One thing I found with nvidia drivers is that if we create an original context with robustness attributes enabled, and then use that as a share context for a new context without robustness, it fails to create anything.

It looks like this code would do that.
We probably want to re-use the kwin code for creating contexts so we can ensure matching attributes

I just pushed the rework patches. Can you rebase your patch on D25299? It allows direct repaints on swap events in the compositing pipeline.

fredrik updated this revision to Diff 70325.Nov 26 2019, 12:42 AM

Rebase on master.

This also removes some code that depended on another uncommitted patch, and fixes an issue I noticed while rebasing; we need to wait until after the swap before we query the buffer age. Note that this update does not address the issue with robustness and share contexts that David pointed out.

I merged the patches regarding swap-events to master. Please rebase one more time. You need to make sure hasSwapEvent() returns true in your case.

I will test the patch with my Nvidia-box after rebase quickly as well.

plugins/platforms/x11/standalone/glxbackend.cpp
121

Ok, let's remember this information. We can first test on master without it though.

327

Can there be a case where we don't have intel-swap-event but also don't want the swap-handler to be activated? Some legacy systems for example.

I remember you provided the table with which hardware uses what extension so you know I'm sure.

plugins/platforms/x11/standalone/glxbackend.h
72

Line length at max 100 chars. Other lines below as well.

Please rebase one more time.

If you've already rebased locally, without issue, there's no point making someone else do it. We're just creating duplicate work and blockages for no reason.

You can comandeer a phab revision under the "add action" combo then update someone else's diff

Please rebase one more time.

If you've already rebased locally, without issue, there's no point making someone else do it. We're just creating duplicate work and blockages for no reason.

I have not rebased his patch locally. Why do you assume I had? And to minimize duplicate work I'm exactly asking Fredrik to do it.

You can comandeer a phab revision under the "add action" combo then update someone else's diff

I won't just update someone else's Diff in Phabricator when it seems the person still actively works on it and there is no other external factor like time pressure for an upcoming release.

fredrik updated this revision to Diff 71402.Dec 12 2019, 11:10 PM
  • Rebase on master
  • Limit the line length to 100 characters.
fredrik marked an inline comment as done.Dec 12 2019, 11:42 PM

For the record, I wouldn't mind if someone who is in a better position to test it commandeered this revision. I think I mentioned that on irc.

plugins/platforms/x11/standalone/glxbackend.cpp
121

It's not going to work properly until that has been fixed, so we certainly can't push it to master before then.

I think the easiest way to fix it is to save the attribute builder or the attribute list that was used to create the main context, and reuse it.

327

The swap-handler should work with other implementations (famous last words?), but we don't know if SwapBuffers() is going to block. If it doesn't, we can't rely on the swap event for scheduling. So if we want to play it safe, we should only enable it on NVIDIA.

Do you know if this will work well on PRIME systems?

Without wanting to derail this thread too much I've spent some time debugging a separate bug in QtQuick qsgthreadedrenderloop - we swap the buffers, then render immediately expecting the next call (glClear) to block. And on prime laptops it simply doesn't.
__GL_MaxFramesAllowed=1 didn't seem to make an impact there. That doesn't have this fence sync stuff, but I doubt that would make a difference?