We noticed that baloo_file starts quite early in the process and then kicks off baloo_file_extractor taking quite some CPU away from the important startup of plasmashell and others.
The reason to start baloo_file early is to be notified of all file changes. The extractor could be started later, and it has some mechanisms to throttle itself if the system isn't idle. If this isn't working as expected we have to fix it.
The idle tracking is only in the extractor process, not in baloo_file itself. On startup it runs the UnindexedFileIndexer and iterates all the folders looking for files to re-index, consuming a considerable amount of CPU time, spending most of its time doing regexp matching, mime type determination, and date time processing. Only after that it may run the extractor process when there's new files to be indexed.
So I think starting baloo_file later is safe since it checks all the files anyway? Otherwise/additionally, we should look into making the UnindexedFileIndexer start delayed.
I think it should be pretty safe to start baloo_file later.
The very reason to add UnindexedFileIndexer was to make sure we index those files which were changed/added when Baloo wasn't running (as well as IndexCleaner to take care of files which were removed).
UPD: I think users won't be able to change any important document while plasma is still starting anyways.
I would assume so. In the graph, which is a VM, ksplashqml is signalled to quit 2 seconds into the startup at which point plasmashell is pretty much done. So if we delay it by 5 seconds or so that could already be plenty I think. I'll give it a go next week.