Move Solid::Device::listFromQuery calls to a separate thread
AbandonedPublic

Authored by apol on Jul 8 2019, 7:08 PM.

Details

Reviewers
davidedmundson
bruns
Group Reviewers
Plasma
Summary

They block the startup notably but the API doesn't require it to be blocking.

Test Plan

Ran plasmashell several times

Diff Detail

Repository
R120 Plasma Workspace
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 13742
Build 13760: arc lint + arc unit
apol created this revision.Jul 8 2019, 7:08 PM
Restricted Application added a project: Plasma. · View Herald TranscriptJul 8 2019, 7:08 PM
Restricted Application added a subscriber: plasma-devel. · View Herald Transcript
apol requested review of this revision.Jul 8 2019, 7:08 PM
apol updated this revision to Diff 61379.Jul 8 2019, 11:11 PM

Call setFuture after connecting to finished

apol updated this revision to Diff 61407.Jul 9 2019, 1:35 PM

Don't move Solid::Device instances between threads

davidedmundson accepted this revision.Jul 10 2019, 3:35 PM
This revision is now accepted and ready to land.Jul 10 2019, 3:35 PM
bruns requested changes to this revision.Jul 11 2019, 11:39 AM
bruns added a subscriber: bruns.

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

This revision now requires changes to proceed.Jul 11 2019, 11:39 AM
apol added a comment.Jul 11 2019, 11:51 AM

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

bruns added a comment.Jul 11 2019, 2:29 PM
In D22333#494152, @apol wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

apol added a comment.Jul 11 2019, 6:47 PM
In D22333#494152, @apol wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

listFromQuery is the big blocking call we are moving into a separate thread.

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

This would mean refactoring how Solid works and even coming up with entirely new concepts. I don't discard doing it at some point but I don't think plasmashell startup blockage is reason enough to redo a framework that has been working for about 5 years as is.

bruns added a comment.Jul 11 2019, 7:12 PM
In D22333#494384, @apol wrote:
In D22333#494152, @apol wrote:

Why not just a singleshot timer from the constructor? Avoids any initial blocking ...

Initial, but doesn't fix the problem. We could potentially delay this few seconds instead, but we'd still be getting an odd ~500ms freeze randomly
Or we can just decide to move blocking or heavy algorithms to separate threads and enjoy our multicore computers and a fluid experience.

Where does the blocking happen? How do you guarantee none of the later call block? Large parts of the code are executed in the main thread anyway, only the initial list creation happens in a worker thread.

listFromQuery is the big blocking call we are moving into a separate thread.

Again, where is it blocking? Which backend?

Also, if this is a generic problem, why only fix it in the dataengines, not in Solid itself?

This would mean refactoring how Solid works and even coming up with entirely new concepts. I don't discard doing it at some point but I don't think plasmashell startup blockage is reason enough to redo a framework that has been working for about 5 years as is.

listFromQuery can be replaced by an asynchronous "enumerate(predicate)" call which uses the existing DeviceAdded signal. This would also remove the inherent race between the listFromQuery and DeviceAdded/DeviceRemoved.

apol added a comment.Jul 11 2019, 11:15 PM

Again, where is it blocking? Which backend?

udisks2 mainly, but every backend can block by its virtue.

listFromQuery can be replaced by an asynchronous "enumerate(predicate)" call which uses the existing DeviceAdded signal. This would also remove the inherent race between the listFromQuery and DeviceAdded/DeviceRemoved.

I've looked into changes that could be done to solid to improve this, nothing felt like a good step forward. This approach works, makes sense and shows results.

bruns added a comment.Jul 12 2019, 9:03 PM
In D22333#494415, @apol wrote:

Again, where is it blocking? Which backend?

udisks2 mainly, but every backend can block by its virtue.

The code can still block, as every invocation of 'Solid::Device(udi)` does a blocking DBus call to udisks2.

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

apol added a comment.Jul 13 2019, 12:51 AM

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

There's a Backend per-thread and deviceCache is not static so each thread will call its own.

bruns added a comment.Jul 13 2019, 1:01 AM
In D22333#494810, @apol wrote:

Also, the code is calling non-threadsafe code from multiple threads now (e.g. once from each the two dataengines helper threads). Each one will call the udisks2 Manager::deviceCache() method.

There's a Backend per-thread and deviceCache is not static so each thread will call its own.

Sorry, but you are wrong, look for
Q_GLOBAL_STATIC(Solid::DeviceManagerStorage, globalDeviceStorage)

bruns added a comment.Jul 16 2019, 2:28 PM

Yes, you are correct, so there are no threading issues, sorry.

On the other hand, this IMHO makes this approach even less profitable:

  1. every backend is initialized once from the two helpers (assuming both dataengines are used).
  2. after the threads finish, all but the relevant UDIs are thrown away (and care must be taken not to use anything else - the devices would be dangling pointers in the now gone thread storage)
  3. the first completing thread triggers the initialization of the backends again - in the main thread.

As the initialization happens implicitly by the Device instantiation, for e.g. UDisks there are two synchronous DBus calls now, one for introspection from the manager backend and one for the device itself from the device backend. Both are done in the main thread, and we will block even longer.

So any gain here is AFAICS from the fact the constructor no longer blocks on a DBus call.

Kinda, you're summary missing a key part.
(the commit description is a bit poor)

This patch does the following:

  • It does the search in the other thread. That creates and iterates every possible device. This is expensive as there are lots of potential devices. We fetch pretty much everything udisks knows.
  • From the given list we recreate the select few devices that are relevant. This is relatively cheap. It does still block as you point out, but now for a much much smaller set of objects compared to before.

(on my laptop that's 56 devices created when searching, resulting in 8 useful devices).

This is where the main thread saving is meant to come in.

As for dangling pointers. The private manager object dies as its backed by TLS. The device object itself does not. The device object has guards against a deleted manager internally. It won't crash, but will return "dead" values.


You're absolutely right it's a hack around the current state, rather than a good long term direction. I'm not exactly loving it.
I would love to rework solid to have a nice async api, and also to not use data engines. I'm not sure its doable within kf5.

apol added a comment.Jul 16 2019, 10:48 PM

Things we could do right now, without a super big refactoring:

On a solid level:

  • Maybe a useful intermediate change would be to add API in Solid to move devices between threads.
  • Alternatively we could consider making sure backends stick to one specific non-GUI thread, keeping only 1 backend per application and have internals speak to it (admittedly a bigger refactoring).

On a plasma level:

  • Keep Solid::Device::listFromQuery calls into a separate QThread and only extract tuples with the information we need rather than a Solid::Device (which was obviously faster but crashed because we can't move devices between threads).

Either way, it's not a matter of KF5, adding new API is perfectly fine, problem is that we may end up redesigning both the frontend and the backend and this is far too much work right now. And so it will be when we're porting things to KF6.

Also, I would suggest not really expecting to be able to do a nice, in-thread async API. Note that here we have exactly the same problem we have on plasma-nm and in Qt bearing networkmanager backend. It's hard to use these types asynchronously and we shouldn't go through hoops to make it thread-local.

bruns added a comment.Jul 17 2019, 4:41 PM

Kinda, you're summary missing a key part.
(the commit description is a bit poor)

This patch does the following:

  • It does the search in the other thread. That creates and iterates every possible device. This is expensive as there are lots of potential devices. We fetch pretty much everything udisks knows.
  • From the given list we recreate the select few devices that are relevant. This is relatively cheap. It does still block as you point out, but now for a much much smaller set of objects compared to before.

    (on my laptop that's 56 devices created when searching, resulting in 8 useful devices).

Just for clarification:

On my system, solid-hardware5 list returns 77 devices, but with the Predicate Filter I am down to 10, and this is filtered by the hotplug engine down to 7.

Half of the devices (32) are serial ports. These are returned by the udev backend, but when combined with the "Camera | PortableMediaPlayer" predicate, the udev backend returns an empty list and nothing is instantiated.

The enumeration in the udev backend, especially 'PortableMediaPlayer' has become much cheaper since https://phabricator.kde.org/D21379.

The Udisks backend indeed returns to much, and unfortunately it creates a backend device for each block device, which implies a number of DBus roundtrips. But this would come down to a single DBus call when D19677 lands.

apol abandoned this revision.Oct 17 2019, 10:16 AM