[platformx/x11] Add a freeze protection against OpenGL
ClosedPublic

Authored by antlarr on Oct 21 2016, 7:20 PM.

Details

Summary

With nouveau driver it can happen that KWin gets frozen when first trying
to render with OpenGL. This results in a freeze of the complete desktop
as the compositor is non functional.

Our OpenGL breakage detection is only able to detect crashes, but not
freezes. This change improves it by also added a freeze protection.

In the PreInit stage a thread is started with a QTimer of 15 sec. If the
timer fires, qFatal is triggered to terminate KWin. This can only happen
if the creation of the OpenGL compositor takes longer than said 15 sec.

In the PostInit stage the timer gets deleted and the thread stopeed
again.

Thus if a freeze is detected the OpenGL unsafe protection is written into
the config. KWin aborts and gets restarted by DrKonqui. The new KWin
instance will no longer try to activate the freezing OpenGL as the
protection is set.

If KWin doesn't freeze the protection is removed from the config as
we are used to.

Check for freezes for the first n frames, not just the first

This patch changes the freeze detection code to detect freezes in the
first 30 frames (by default, users can change that with the
KWIN_MAX_FRAMES_TESTED environment variable). This detects
successfully the freezes associated to nouveau drivers
in https://bugzilla.suse.com/show_bug.cgi?id=1005323

Diff Detail

Branch
arcpatch-D3127_1
Lint
No Linters Available
Unit
No Unit Test Coverage
antlarr updated this revision to Diff 7599.Oct 21 2016, 7:20 PM
antlarr retitled this revision from to [platformx/x11] Add a freeze protection against OpenGL.
antlarr updated this object.
antlarr edited the test plan for this revision. (Show Details)
antlarr added reviewers: KWin, Plasma, davidedmundson.
antlarr added a project: Plasma.
antlarr added subscribers: davidedmundson, plasma-devel, kwin.
Restricted Application edited projects, added KWin; removed Plasma. · View Herald TranscriptOct 21 2016, 7:20 PM

How many frames do we need to detect the freeze? I think 30 frames is too much by default. It's adding quite some stress on startup (lots of write to config, lots of additional threads started). Assuming we freeze with tripple buffer when trying to get the next buffer 4 frames should be sufficient to trigger the freeze?

How many frames do we need to detect the freeze? I think 30 frames is too much by default. It's adding quite some stress on startup (lots of write to config, lots of additional threads started). Assuming we freeze with tripple buffer when trying to get the next buffer 4 frames should be sufficient to trigger the freeze?

Yesterday I tried with 20 and 10 frames and it detected the freeze correctly, then tried 5 and got a strange behaviour (the desktop wasn't frozen but the menus were shown displaced and there was no window decoration). I decided I would remove an option I added to the nouveau driver to test if it helped and rebooted the machine to be sure I tested with a clean system. But then I lost the connection to the machine and it seems to have a problem booting up so I can't do more tests until Monday when someone goes to the SUSE offices where the machine is and check what the problem is.

In the meantime, I will improve a bit the PreFrame and PostFrame code so it doesn't add so much overhead (reusing the thread and timer, and only write the config option in the lambda function, which wouldn't work to detect a crash, but it should work to detect a freeze, so I'll separate the code of PreFrame/PostFrame from the code in PreInit/PostInit).

I'll update this phab later today with those optimizations.

antlarr updated this revision to Diff 7613.EditedOct 23 2016, 10:29 AM
  • Remove most overhead caused by the freeze-detection thread

I separated (Pre/Post)Init and (Pre/Post)Frame code in x11_platform
so now *Init "detects" crashes and *Frame detects freezes.

Also, now Pre/PostFrame can be called many times in a row since
PostFrame doesn't remove the thread but just deletes the timer
and PreFrame reuses the thread if it was already created.

I added OpenGLSafePoint::PostLastFrame which is called after the last
PostFrame safe point, and really removes the thread (also will delete
the timer if it wasn't deleted earlier)

Also, for freeze detection there's no need to write
OpenGLIsUnsafe=true in PreFrame and correct it in PostFrame since
kwin doesn't crash and I guess it's not nice to write twice to the
config file in every frame, even if it's just the first n frames and
the operating system will probably cache those writes. So I moved the
code to only set OpenGLIsUnsafe=true when a freeze is detected.

Restricted Application edited projects, added Plasma; removed KWin. · View Herald TranscriptOct 23 2016, 10:29 AM

Btw, I tested the last commit on a virtual machine simulating the freeze with a sleep. I'll check tomorrow what's the minimum value for m_maxFramesTestedForSafety that works to detect the real freeze and update its default value.

luebking added inline comments.
composite.h
242

m_testedFrames or similar, "Frame Count" is too generic.
Alternatively, just use some "int m_freezeDectionFrames = 30;" and count them down (so when < 1, you're done)

platform.h
148 ↗(On Diff #7613)

Have a PreFirstFrame still and make PostLastFrame PostLastGuardedFrame or similar. ("Last frame" is right before kwin shuts down ;-)

plugins/platforms/x11/standalone/x11_platform.cpp
212

Why do you recreate it with every frame?
Only create it PreFirstFrame and delete it PostLastFrame (and inbeteween simply restart the pointer PreFrame)

221

same goes for config writing

antlarr added inline comments.Oct 23 2016, 6:31 PM
composite.h
242

ok. @graesslin do you prefer a count down or a count up and max? In any case, I agree with changing the name.

platform.h
148 ↗(On Diff #7613)

There's no need for PreFirstFrame, PreFrame is the same for the first or for the rest of "Pre" cases. But for the PostLastGuardedFrame note I agree. I'll change that.

plugins/platforms/x11/standalone/x11_platform.cpp
212

The timer is in another thread, so it can't be stopped/restarted from this thread.

221

The config is not written in every frame. It's only saved when the timer is triggered. That is, when a freeze is detected. That was the point of the "remove overhead" commit.

luebking added inline comments.Oct 23 2016, 6:38 PM
plugins/platforms/x11/standalone/x11_platform.cpp
212

You should be able to call it as slot (whether QMetaObject::invokeMethod() works, I've never tried)

221

Sorry, misread. (Actually didn't really read and just wanted to point out the other case ;-)

antlarr added inline comments.Oct 23 2016, 7:20 PM
plugins/platforms/x11/standalone/x11_platform.cpp
212

yeah, but then it wouldn't be a synchronous call and it would lose all its meaning, isn't it? :)

221

no problem :)

luebking added inline comments.Oct 23 2016, 7:30 PM
plugins/platforms/x11/standalone/x11_platform.cpp
212

No, why? The purpose is to push the freeze timer forward as long as the GUI thread isn't blocked.
If the GUI thread is blocked, the forward pushing won't happen, the timer times out and set GL unsafe etc.

Do I miss something?

graesslin added inline comments.Oct 24 2016, 6:01 AM
composite.cpp
210–214

why is the scene creation also guarded with PreFrame and PostLastFrame? Contextual that doesn't make sense as createScene doesn't render any frames.

composite.h
242

I would just count down the m_maxFramesTestedForSafety till it reaches 0

plugins/platforms/x11/standalone/x11_platform.cpp
212

You can use QMetaObject::invokeMethod with Qt::QueuedConnection.

231–240

I think you can merge the PostFrame with PostInit. So that the init test also does the freeze testing. That would also solve the conceptual problem I pointed out above.

antlarr added inline comments.Oct 24 2016, 7:24 AM
composite.h
242

ok

plugins/platforms/x11/standalone/x11_platform.cpp
212

Oh, I thought QueuedConnection needed to get to the event loop to execute the slot method, but I just noticed it's the event loop of the receiver object, so it would be fine. I'll change that.

231–240

Do you prefer to merge PostInit with PostFrame or with PostLastGuardedFrame? The difference would be that if it's merged with PostFrame and a user sets KWIN_MAX_FRAMES_TESTED to 0 (so no freeze detection is done when painting frames) then the detection thread and timer wouldn't be deleted and if it's merged with PostLastGuardedFrame, then a new thread and timer will be created for the frame drawing. I would merge it with PostFrame, since it requires a user who probably knows what he's doing to interact and is more optimized for 99.9999% of cases, and still an idle thread and stopped timer shouldn't consume any resources in the rare case a user sets that, but I ask just in case.

graesslin added inline comments.Oct 24 2016, 7:33 AM
plugins/platforms/x11/standalone/x11_platform.cpp
231–240

yep, agree.

antlarr updated this revision to Diff 7636.Oct 24 2016, 2:52 PM
antlarr marked 21 inline comments as done.
  • Use only one timer to detect freezes and other optimizations
  • Test 3 frames for freezes by default
Restricted Application edited projects, added KWin; removed Plasma. · View Herald TranscriptOct 24 2016, 2:52 PM
antlarr updated this revision to Diff 7637.Oct 24 2016, 2:56 PM

Include all 6 commits

Restricted Application edited projects, added Plasma; removed KWin. · View Herald TranscriptOct 24 2016, 2:56 PM
graesslin added inline comments.Oct 24 2016, 3:02 PM
composite.cpp
748

this misses the

if (m_scene->compositingType() & OpenGLCompositing)
plugins/platforms/x11/standalone/x11_platform.cpp
225–227

nitpick: coding style

} else {
luebking added inline comments.Oct 24 2016, 3:07 PM
composite.cpp
748

also indention.

plugins/platforms/x11/standalone/x11_platform.cpp
206

PreFrame is (now) effectively "PreFirstGuardedFrame", is it?
And if invoked at some later point would create the timer and hit the config rewrite every single frame (for the counter is stuck at 0)?

> rename to avoid bad invocation?

antlarr added inline comments.Oct 24 2016, 3:10 PM
plugins/platforms/x11/standalone/x11_platform.cpp
206

Not really, it's expected to be called many times and it checks if it's the first time it was called or not (also, when the counter gets to 0 it's never called anymore)

antlarr updated this revision to Diff 7638.Oct 24 2016, 3:16 PM
  • Only call PostLastGuardedFrame when OpenGLCompositing is set
Restricted Application edited projects, added KWin; removed Plasma. · View Herald TranscriptOct 24 2016, 3:16 PM
antlarr marked 5 inline comments as done.Oct 24 2016, 4:04 PM

All issues done

The latest patch version seems to not include all changes.

antlarr updated this revision to Diff 7644.Oct 25 2016, 6:55 AM

Include all 7 commits

Restricted Application edited projects, added Plasma; removed KWin. · View Herald TranscriptOct 25 2016, 6:55 AM

I don't know why, but it seems arc diff doesn't work anymore and I have to use arc diff HEAD~7 to include all 7 commits

graesslin accepted this revision.Oct 25 2016, 7:07 AM
graesslin added a reviewer: graesslin.

please push as one (squashed) commit

This revision is now accepted and ready to land.Oct 25 2016, 7:07 AM
antlarr closed this revision.Oct 25 2016, 7:51 AM