Move Solid::Device::listFromQuery calls to a separate thread
AbandonedPublic
Actions

Authored by apol on Jul 8 2019, 7:08 PM.

Details

Reviewers

davidedmundson
bruns

Group Reviewers

Plasma

Summary

They block the startup notably but the API doesn't require it to be blocking.

Test Plan

Ran plasmashell several times

Diff Detail

Repository

R120 Plasma Workspace

Branch

master

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 13742
Build 13760: arc lint + arc unit

apol created this revision.Jul 8 2019, 7:08 PM

Restricted Application added a project: Plasma. · View Herald TranscriptJul 8 2019, 7:08 PM

Restricted Application added a subscriber: plasma-devel. · View Herald Transcript

apol requested review of this revision.Jul 8 2019, 7:08 PM

Harbormaster completed remote builds in B13720: Diff 61372.Jul 8 2019, 7:08 PM

Call setFuture after connecting to finished

Harbormaster completed remote builds in B13725: Diff 61379.Jul 8 2019, 11:11 PM

Don't move Solid::Device instances between threads

Harbormaster completed remote builds in B13742: Diff 61407.Jul 9 2019, 1:36 PM

apol mentioned this in D22339: Make sure solid backends are reentrant.Jul 9 2019, 1:37 PM

davidedmundson accepted this revision.Jul 10 2019, 3:35 PM

This revision is now accepted and ready to land.Jul 10 2019, 3:35 PM

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

This revision now requires changes to proceed.Jul 11 2019, 11:39 AM

In D22333#494146, @bruns wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

In D22333#494152, @apol wrote:

In D22333#494146, @bruns wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

In D22333#494253, @bruns wrote:

In D22333#494152, @apol wrote:

In D22333#494146, @bruns wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

listFromQuery is the big blocking call we are moving into a separate thread.

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

This would mean refactoring how Solid works and even coming up with entirely new concepts. I don't discard doing it at some point but I don't think plasmashell startup blockage is reason enough to redo a framework that has been working for about 5 years as is.

In D22333#494384, @apol wrote:

In D22333#494253, @bruns wrote:

In D22333#494152, @apol wrote:

In D22333#494146, @bruns wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

listFromQuery is the big blocking call we are moving into a separate thread.

Again, where is it blocking? Which backend?

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

This would mean refactoring how Solid works and even coming up with entirely new concepts. I don't discard doing it at some point but I don't think plasmashell startup blockage is reason enough to redo a framework that has been working for about 5 years as is.

listFromQuery can be replaced by an asynchronous "enumerate(predicate)" call which uses the existing DeviceAdded signal. This would also remove the inherent race between the listFromQuery and DeviceAdded/DeviceRemoved.

In D22333#494389, @bruns wrote:

Again, where is it blocking? Which backend?

udisks2 mainly, but every backend can block by its virtue.

listFromQuery can be replaced by an asynchronous "enumerate(predicate)" call which uses the existing DeviceAdded signal. This would also remove the inherent race between the listFromQuery and DeviceAdded/DeviceRemoved.

I've looked into changes that could be done to solid to improve this, nothing felt like a good step forward. This approach works, makes sense and shows results.

In D22333#494415, @apol wrote:

In D22333#494389, @bruns wrote:

Again, where is it blocking? Which backend?

udisks2 mainly, but every backend can block by its virtue.

The code can still block, as every invocation of 'Solid::Device(udi)` does a blocking DBus call to udisks2.

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

In D22333#494774, @bruns wrote:

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

There's a Backend per-thread and deviceCache is not static so each thread will call its own.

In D22333#494810, @apol wrote:

In D22333#494774, @bruns wrote:

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

There's a Backend per-thread and deviceCache is not static so each thread will call its own.

Sorry, but you are wrong, look for
Q_GLOBAL_STATIC(Solid::DeviceManagerStorage, globalDeviceStorage)

apol mentioned this in R245:00abb4839abe: Make sure solid backends are reentrant.Jul 13 2019, 1:13 AM

https://phabricator.kde.org/source/solid/browse/master/src/solid/devices/frontend/devicemanager.cpp$301
@bruns, It's backend per thread

In D22333#494852, @anthonyfieroni wrote:

https://phabricator.kde.org/source/solid/browse/master/src/solid/devices/frontend/devicemanager.cpp$301
@bruns, It's backend per thread

Yes, you are correct, so there are no threading issues, sorry.

On the other hand, this IMHO makes this approach even less profitable:

every backend is initialized once from the two helpers (assuming both dataengines are used).
after the threads finish, all but the relevant UDIs are thrown away (and care must be taken not to use anything else - the devices would be dangling pointers in the now gone thread storage)
the first completing thread triggers the initialization of the backends again - in the main thread.

As the initialization happens implicitly by the Device instantiation, for e.g. UDisks there are two synchronous DBus calls now, one for introspection from the manager backend and one for the device itself from the device backend. Both are done in the main thread, and we will block even longer.

So any gain here is AFAICS from the fact the constructor no longer blocks on a DBus call.

Kinda, you're summary missing a key part.
(the commit description is a bit poor)

This patch does the following:

It does the search in the other thread. That creates and iterates every possible device. This is expensive as there are lots of potential devices. We fetch pretty much everything udisks knows.

From the given list we recreate the select few devices that are relevant. This is relatively cheap. It does still block as you point out, but now for a much much smaller set of objects compared to before.

(on my laptop that's 56 devices created when searching, resulting in 8 useful devices).

This is where the main thread saving is meant to come in.

As for dangling pointers. The private manager object dies as its backed by TLS. The device object itself does not. The device object has guards against a deleted manager internally. It won't crash, but will return "dead" values.

You're absolutely right it's a hack around the current state, rather than a good long term direction. I'm not exactly loving it.
I would love to rework solid to have a nice async api, and also to not use data engines. I'm not sure its doable within kf5.

Things we could do right now, without a super big refactoring:

On a solid level:

Maybe a useful intermediate change would be to add API in Solid to move devices between threads.
Alternatively we could consider making sure backends stick to one specific non-GUI thread, keeping only 1 backend per application and have internals speak to it (admittedly a bigger refactoring).

On a plasma level:

Keep Solid::Device::listFromQuery calls into a separate QThread and only extract tuples with the information we need rather than a Solid::Device (which was obviously faster but crashed because we can't move devices between threads).

Either way, it's not a matter of KF5, adding new API is perfectly fine, problem is that we may end up redesigning both the frontend and the backend and this is far too much work right now. And so it will be when we're porting things to KF6.

Also, I would suggest not really expecting to be able to do a nice, in-thread async API. Note that here we have exactly the same problem we have on plasma-nm and in Qt bearing networkmanager backend. It's hard to use these types asynchronously and we shouldn't go through hoops to make it thread-local.

In D22333#496544, @davidedmundson wrote:

Kinda, you're summary missing a key part.
(the commit description is a bit poor)

This patch does the following:

It does the search in the other thread. That creates and iterates every possible device. This is expensive as there are lots of potential devices. We fetch pretty much everything udisks knows.

From the given list we recreate the select few devices that are relevant. This is relatively cheap. It does still block as you point out, but now for a much much smaller set of objects compared to before.

(on my laptop that's 56 devices created when searching, resulting in 8 useful devices).

Just for clarification:

On my system, solid-hardware5 list returns 77 devices, but with the Predicate Filter I am down to 10, and this is filtered by the hotplug engine down to 7.

Half of the devices (32) are serial ports. These are returned by the udev backend, but when combined with the "Camera | PortableMediaPlayer" predicate, the udev backend returns an empty list and nothing is instantiated.

The enumeration in the udev backend, especially 'PortableMediaPlayer' has become much cheaper since https://phabricator.kde.org/D21379.

The Udisks backend indeed returns to much, and unfortunately it creates a backend device for each block device, which implies a number of DBus roundtrips. But this would come down to a single DBus call when D19677 lands.

apol abandoned this revision.Oct 17 2019, 10:16 AM

Revision Contents
Changeset List

			Path	Packages
M			dataengines/devicenotifications/CMakeLists.txt (1 line)
M			dataengines/devicenotifications/ksolidnotify.cpp (40 lines)
M			dataengines/hotplug/hotplugengine.h (2 lines)
M			dataengines/hotplug/hotplugengine.cpp (61 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	61372	7e71b86		Jul 8 2019, 7:08 PM	★	★
Diff 2	61379	7e71b86	Call setFuture after connecting to finished	Jul 8 2019, 11:11 PM	★	★
Diff 3	61407	7e71b86	Don't move Solid::Device instances between threads	Jul 9 2019, 1:35 PM	★	★