We noticed that baloo_file starts quite early in the process and then kicks off baloo_file_extractor taking quite some CPU away from the important startup of plasmashell and others.
Details
- Reviewers
bruns davidedmundson ngraham - Group Reviewers
Baloo - Maniphest Tasks
- T10958: Faster Startup
- Commits
- R293:a24624c88d79: Delay running UnindexedFileIndexer and IndexCleaner
Checked that the unindexed file codepath is hit a while after login has completed
Diff Detail
- Repository
- R293 Baloo
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
The reason to start baloo_file early is to be notified of all file changes. The extractor could be started later, and it has some mechanisms to throttle itself if the system isn't idle. If this isn't working as expected we have to fix it.
Thanks for your input. Yeah, I was mostly interested in having the file extractor wait some more. I'll poke the thing a bit then.
One possible reason for the idle tracking not working, IIRC it tries to connect to some service via DBus, don't know what happens if that is not available (yet).
The idle tracking is only in the extractor process, not in baloo_file itself. On startup it runs the UnindexedFileIndexer and iterates all the folders looking for files to re-index, consuming a considerable amount of CPU time, spending most of its time doing regexp matching, mime type determination, and date time processing. Only after that it may run the extractor process when there's new files to be indexed.
So I think starting baloo_file later is safe since it checks all the files anyway? Otherwise/additionally, we should look into making the UnindexedFileIndexer start delayed.
I think it should be pretty safe to start baloo_file later.
The very reason to add UnindexedFileIndexer was to make sure we index those files which were changed/added when Baloo wasn't running (as well as IndexCleaner to take care of files which were removed).
UPD: I think users won't be able to change any important document while plasma is still starting anyways.
I would very much prefer only delaying the UnindexedFileIndexer and the IndexCleaner. These two are stopgap measures.
Look into delaying the UnindexedFileIndexer and IndexCleaner. Can I just use QTimer or do you want something more sophisticated?
I think everything below 10 seconds would be completely fine, is that sufficient to let the remaining ones start?
I would assume so. In the graph, which is a VM, ksplashqml is signalled to quit 2 seconds into the startup at which point plasmashell is pretty much done. So if we delay it by 5 seconds or so that could already be plenty I think. I'll give it a go next week.