Possible solution for scheduling repaints on NVIDIA
Needs ReviewPublic

Authored by fredrik on Sep 11 2019, 5:06 PM.

Details

Reviewers
romangg
Group Reviewers
KWin
Summary

We keep setting __GL_MaxFramesAllowed to 1, but move the blocking glXSwapBuffers() call to a separate thread and synthesize a swap event by emitting a signal when the call returns.

This way we can call glXSwapBuffers() at any point within the swap interval without blocking the main thread, and we can use the same scheduling code with all drivers.

Test Plan

Compile tested only.

Diff Detail

Repository
R108 KWin
Lint
Lint Skipped
Unit
Unit Tests Skipped
fredrik created this revision.Sep 11 2019, 5:06 PM
Restricted Application added a project: KWin. · View Herald TranscriptSep 11 2019, 5:06 PM
Restricted Application added a subscriber: kwin. · View Herald Transcript
fredrik requested review of this revision.Sep 11 2019, 5:06 PM
fredrik edited reviewers, added: romangg; removed: ngraham.Sep 11 2019, 5:27 PM

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

plugins/platforms/x11/standalone/glxbackend.cpp
346

Is this necessary? If the swap thread isn't rendering anything I don't think it actually needs a current context, right?

793

I'm not 100% sure this is sufficient to ensure all rendering is complete before the swap. It only guarantees that any commands have been submitted to the GPU, but not necessarily that they've all finished executing. Furthermore, glXSwapBuffers should cause an implicit glFlush anyway.

For instance, the glXSwapBuffers spec mentions glFinish for this purpose:

All GLX rendering contexts share the same notion of which are front buffers and which are back buffers. One consequence is that when multiple clients are rendering to the same double-buffered window, all of them should finish rendering before one of them issues the command to swap buffers. The clients are responsible for implementing this synchronization. Typically this is accomplished by executing glFinish and then using a semaphore in shared memory to rendezvous before swapping.

romangg added a comment.EditedSep 20 2019, 11:04 AM

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

@fredrik: I would be interested in this as well. This change allows to schedule buffer swaps like we do with swap events at some point before the vblank and then get an event through second thread when the thread is unblocked again i.e. when the swap has been completed, right? This should also work with my rework patches only providing a single path with an event after swap (or a fallback timer if such an event is not available on the hardware).

And what do you think of using https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt as suggested by Erik in D23105#525696 instead of blocking through the __GL_MaxFramesAllowed equals 1 setting? Your approach with the thread maps to the model "swap -> wait for event -> (delay for smaller latency ->) swap -> wait for event -> ..." used by the mesa drivers pretty well though.

Also, apart from the above two comments, any thoughts to how this relates to Roman's pending re-work of a lot of the GLX code https://phabricator.kde.org/D23105?

@fredrik: I would be interested in this as well. This change allows to schedule buffer swaps like we do with swap events at some point before the vblank and then get an event through second thread when the thread is unblocked again i.e. when the swap has been completed, right? This should also work with my rework patches only providing a single path with an event after swap (or a fallback timer if such an event is not available on the hardware).

Yeah, that's the idea. This patch is meant to be integrated with those patches; it doesn't make any sense on its own.

And what do you think of using https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt as suggested by Erik in D23105#525696 instead of blocking through the __GL_MaxFramesAllowed equals 1 setting? Your approach with the thread maps to the model "swap -> wait for event -> (delay for smaller latency ->) swap -> wait for event -> ..." used by the mesa drivers pretty well though.

I actually started on a DelayBeforeSwapTimer that calls glXDelayBeforeSwapNV() in a separate thread. The idea being that Compositor would use that timer instead of QBasicTimer when the driver supports the extension. But as I was writing that code I realized that that approach would end up being more complex than simply doing the buffer swap in the other thread.

plugins/platforms/x11/standalone/glxbackend.cpp
346

Yeah, I actually just assumed that glXSwapBuffers() needs a current context, but indeed the specification doesn't say that anywhere.

793

Yeah, I was actually thinking that we may need to pass a fence to the other thread to ensure that. glXSwapBuffers() only flushes the context current to the thread where it is called, so that's why I added the explicit flush.

fredrik updated this revision to Diff 66636.Sep 22 2019, 11:47 PM

Use a fence to ensure that all rendering is complete before swapping buffers.

Concept wise, I really like it. It seems simple and well-encapsulated.

plugins/platforms/x11/standalone/glxbackend.cpp
119

One thing I found with nvidia drivers is that if we create an original context with robustness attributes enabled, and then use that as a share context for a new context without robustness, it fails to create anything.

It looks like this code would do that.
We probably want to re-use the kwin code for creating contexts so we can ensure matching attributes

I just pushed the rework patches. Can you rebase your patch on D25299? It allows direct repaints on swap events in the compositing pipeline.