Cleanup KSharedUiServerProxy before qApp exists
AbandonedPublic

Authored by kfunk on Aug 23 2016, 8:03 AM.

Details

Reviewers
vonreth
dfaure
Summary

... to avoid deadlocks with the DBus thread on Windows.

Related to D1909

Test Plan

No longer deadlocks on Windows (tested kdevelop & konversation)

Diff Detail

Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
kfunk updated this revision to Diff 6172.Aug 23 2016, 8:03 AM
kfunk updated this revision to Diff 6173.
kfunk retitled this revision from to Cleanup KSharedUiServerProxy before qApp exists.
kfunk updated this object.
kfunk edited the test plan for this revision. (Show Details)

Added CCBUG

kfunk added a subscriber: Frameworks.
kfunk updated this revision to Diff 6175.Aug 23 2016, 9:10 AM

Fix typo in commit msg

dfaure accepted this revision.Aug 26 2016, 11:10 PM
dfaure edited edge metadata.

I would prefer if we could fix these kind of issues in Qt itself, but I've had a difficult time doing that (*), this stuff is tricky.

(*) for other dbus-thread related problems, not necessarily this particular one. E.g. https://codereview.qt-project.org/167219

This revision is now accepted and ready to land.Aug 26 2016, 11:10 PM
arrowd added a subscriber: arrowd.Sep 2 2016, 2:56 PM
In D2545#48488, @dfaure wrote:

Hmm can you check if https://codereview.qt-project.org/161056 fixes it?

I've tried both Thiago patches on 5.7 branch and they don't fix this problem.

dfaure added a comment.Sep 3 2016, 8:28 PM

So what is the actual deadlock you're experiencing?

I showed this to Thiago and he said he needs more info on what is actually happening, i.e. where exactly it deadlocks in QtDBus.

kfunk added a comment.Sep 4 2016, 12:43 PM

Here's the backtrace (note: using Qt 5.6 branch):

 	ntdll.dll!NtWaitForSingleObject()	Unknown
 	KernelBase.dll!00007ffbcb2eaadf()	Unknown
>	Qt5Core.dll!QWaitCondition::wait(QMutex * mutex=0x000001c24fd16520, unsigned long time=4294967295) Line 178	C++
 	Qt5Core.dll!QSemaphore::acquire(int n=1) Line 136	C++
 	Qt5Core.dll!QMetaObject::activate(QObject * sender=0x000001c24a6eb500, int signalOffset, int local_signal_index, void * * argv=0x00000081e131f648) Line 3699	C++
 	Qt5Core.dll!QObject::~QObject() Line 913	C++
 	Qt5DBus.dll!QDBusServiceWatcher::`vector deleting destructor'(unsigned int)	C++
 	Qt5Core.dll!QObjectPrivate::deleteChildren() Line 1960	C++
 	Qt5Core.dll!QObject::~QObject() Line 1034	C++
 	KF5JobWidgets.dll!KSharedUiServerProxy::~KSharedUiServerProxy() Line 325	C++
 	KF5JobWidgets.dll!``anonymous namespace'::Q_QGS_serverProxy::innerFunction'::`2'::`dynamic atexit destructor for 'holder''()	C++
 	ucrtbase.dll!_time64()	Unknown
 	ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<class <lambda_e24bbb7b643b32fcea6fa61b31d4c984>,class <lambda_275893d493268fdec8709772e3fcec0e> &,class <lambda_9d71df4d7cf3f480f8d633942495c3b0> >(class <lambda_e24bbb7b643b32fcea6fa61b31d4c984> &&,class <lambda_275893d493268fdec8709772e3fcec0e> &,class <lambda_9d71df4d7cf3f480f8d633942495c3b0> &&)	Unknown
 	ucrtbase.dll!_execute_onexit_table()	Unknown
 	KF5JobWidgets.dll!dllmain_crt_process_detach(const bool is_terminating=true) Line 109	C++
 	KF5JobWidgets.dll!dllmain_dispatch(HINSTANCE__ * const instance=0x00007ffbb8e10000, const unsigned long reason=0, void * const reserved=0x0000000000000001) Line 207	C++
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrShutdownProcess()	Unknown
 	ntdll.dll!RtlExitUserProcess()	Unknown
 	kernel32.dll!ExitProcessImplementation()	Unknown
 	ucrtbase.dll!swprintf()	Unknown
 	ucrtbase.dll!swprintf()	Unknown
 	kdevelop.exe!__scrt_common_main_seh() Line 262	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown
dfaure added a comment.Sep 8 2016, 9:13 PM

We need a backtrace with all threads to understand what's happening.

Any update on this?

Since there is no fix on Qt, should we merge this?

Actually, I think Thiago's still waiting for a backtrace of all threads.

kfunk added a comment.Dec 1 2016, 9:36 PM
In D2545#65976, @dfaure wrote:

Actually, I think Thiago's still waiting for a backtrace of all threads.

I think I've already said this via other channels: There's only one thread left at this point.

thiago added a subscriber: thiago.Dec 3 2016, 5:32 PM
In D2545#66545, @kfunk wrote:
In D2545#65976, @dfaure wrote:

Actually, I think Thiago's still waiting for a backtrace of all threads.

I think I've already said this via other channels: There's only one thread left at this point.

That's troubling. That means the QDBusConnectionManager thread has exited but the QDBusConnectionPrivate object still exists. That is refcounted and deletes itself, though QDBusServicesWatcher is not the reason for refcounting.

Commit ad66dbe305cff72443f4d3484191872d56e6dfbb in qtbase did try to solve this by disconnecting the senders when the objects were getting deleted. closeConnection() is called before the thread exits (from QDBusConnectionManager::run), so I don't know how there's still a BlockingQueuedConnection active.

Is it possible that the service began being watched during destruction?

thiago added a comment.Dec 3 2016, 5:42 PM
In D2545#66904, @thiago wrote:

Commit ad66dbe305cff72443f4d3484191872d56e6dfbb in qtbase did try to solve this by disconnecting the senders when the objects were getting deleted. closeConnection() is called before the thread exits (from QDBusConnectionManager::run), so I don't know how there's still a BlockingQueuedConnection active.

Is it possible that the service began being watched during destruction?

I don't see how that's possible because there's another BlockingQueuedConnection protecting that: the QObject connection is only done in QDBusConnectionPrivate::addSignalHook, which is only run in the QDBusConnectionManager thread. So either the thread was still running, in which case we should have disconnected, or the thread wasn't running and the destroyed() signal should have got disconnected.

Here's the other problem: it's possible for threads to simply disappear on Windows. Given that I see "dllmain" in the backtrace (though not DllMain), I can't rule out that this has happened. Qt 5.6 has a workaround to another deadlock caused by Windows. Can you try to cherry-pick 3ec57107cedb154f256e3ad001ea5475cc64fa94 from 5.8?

Here's the other problem: it's possible for threads to simply disappear on Windows. Given that I see "dllmain" in the backtrace (though not DllMain), I can't rule out that this has happened. Qt 5.6 has a workaround to another deadlock caused by Windows. Can you try to cherry-pick 3ec57107cedb154f256e3ad001ea5475cc64fa94 from 5.8?

With 3ec57107cedb154f256e3ad001ea5475cc64fa94 applied: Still dead-locking, same backtrace.

In D2545#69083, @kfunk wrote:

Here's the other problem: it's possible for threads to simply disappear on Windows. Given that I see "dllmain" in the backtrace (though not DllMain), I can't rule out that this has happened. Qt 5.6 has a workaround to another deadlock caused by Windows. Can you try to cherry-pick 3ec57107cedb154f256e3ad001ea5475cc64fa94 from 5.8?

With 3ec57107cedb154f256e3ad001ea5475cc64fa94 applied: Still dead-locking, same backtrace.

Ok, so I guess this was a different, but similar issue, that got fixed by that commit.

The root cause is that at some point during the process shut down (after ExitProcess is called), the Windows runtime kills all other threads, without giving them a chance to clean up. Because of that, when static destructors run, the other threads are no longer running, even though the objects that managed them (QThread) says they are. The application is in an inconsistent state.

More information on this Windows behaviour:

There doesn't seem to be a way of doing some clean up before the threads are forcibly killed.

Maybe if I abuse qtmain().

For the record, since I don't see an easy fix I'm pondering about patching qtbase in craft.git:

Ideas:

a ) Add this to QDBusConnectionManager ctor:

qAddPostRoutine([]() {
    QMetaObject::invokeMethod(QDBusConnectionManager::instance(), "quit");
});

b) Just cherry-pick https://codereview.qt-project.org/#/c/147261/1 (tried to fix the same issue, but got abandoned)

This might be killing DBus connections too early, but it's better than hanging the process on exit (I'm being pragmatic here).

In D2545#69091, @thiago wrote:

There doesn't seem to be a way of doing some clean up before the threads are forcibly killed.

Maybe if I abuse qtmain().

This would only fix it for non-console GUI applications if I understand correctly. That's what we have here, but we'd still keep console applications buggy. There's no way to inject code before ExitProcess() with a plain console application.

In D2545#69187, @kfunk wrote:
In D2545#69091, @thiago wrote:

There doesn't seem to be a way of doing some clean up before the threads are forcibly killed.

Maybe if I abuse qtmain().

This would only fix it for non-console GUI applications if I understand correctly. That's what we have here, but we'd still keep console applications buggy. There's no way to inject code before ExitProcess() with a plain console application.

I was reading the ucrt source code yesterday and it does look like it processes atexit() registrations prior to ExitProcess(). So I'll go for that.

There is a side-effect: I have to bring back the patch that makes QtDBus DLL non-unloadable, since otherwise it could crash during exit if had already been unloaded (was loaded by a plugin).

BTW, do you know if this problem happens with MinGW?

kfunk added a comment.EditedDec 19 2016, 11:53 PM
In D2545#69298, @thiago wrote:
In D2545#69187, @kfunk wrote:
In D2545#69091, @thiago wrote:

There doesn't seem to be a way of doing some clean up before the threads are forcibly killed.

Maybe if I abuse qtmain().

This would only fix it for non-console GUI applications if I understand correctly. That's what we have here, but we'd still keep console applications buggy. There's no way to inject code before ExitProcess() with a plain console application.

I was reading the ucrt source code yesterday and it does look like it processes atexit() registrations prior to ExitProcess(). So I'll go for that.

It does not work with atext() -- I still get the same hang if I use that.

So:

  • std::atexit(...) -> hang
  • qAddPostRoutine(...) -> no hang

std::atexit(...) inside QDBusConnectionManager() is run way before the initialization of the KSharedUiServerProxy global static. I'd assume the global statics use a similar technique to register their destructor functions; but due to the nature of atexit-functions running in reverse order at destruction time, the destructor for the KSharedUiServerProxy global static is run before our registered atexit function. => Thus we have the same problem as before: QDBusConnectionManager is not destructed early enough

There is a side-effect: I have to bring back the patch that makes QtDBus DLL non-unloadable, since otherwise it could crash during exit if had already been unloaded (was loaded by a plugin).

BTW, do you know if this problem happens with MinGW?

According to this comment here https://bugs.kde.org/show_bug.cgi?id=366596#c5 it also happens when built with MinGW.

kfunk abandoned this revision.Jan 17 2017, 7:35 AM

Patched Qt instead:

https://phabricator.kde.org/R138:7f78f7297add7c510c1cf0f1707197c8c403fcee

I can't make my mind up how to fix this cleanly, to be honest.

Given we always needs to patch Qt for this, shall we not use this workaround in our code instead?
I think the other "it always hangs" stuff in Qt bug https://bugreports.qt.io/browse/QTBUG-49870 got fixed.
Would this patch here making patching Qt not needed for Windows?

David, could we apply this patch, even with the workaround in the Qt patch in craft to have a better experience for people not patching their Qt?

I don't mind, go ahead.

Ok, ah, no longer applies cleanly, must redo some things it seems.
Will create a new patch for submission.