Delay running UnindexedFileIndexer and IndexCleaner
ClosedPublic

Authored by broulik on May 28 2019, 9:35 AM.

Details

Summary

We noticed that baloo_file starts quite early in the process and then kicks off baloo_file_extractor taking quite some CPU away from the important startup of plasmashell and others.

Test Plan

Checked that the unindexed file codepath is hit a while after login has completed

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
broulik created this revision.May 28 2019, 9:35 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptMay 28 2019, 9:35 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
broulik requested review of this revision.May 28 2019, 9:35 AM

The reason to start baloo_file early is to be notified of all file changes. The extractor could be started later, and it has some mechanisms to throttle itself if the system isn't idle. If this isn't working as expected we have to fix it.

Thanks for your input. Yeah, I was mostly interested in having the file extractor wait some more. I'll poke the thing a bit then.

One possible reason for the idle tracking not working, IIRC it tries to connect to some service via DBus, don't know what happens if that is not available (yet).

broulik added a comment.EditedMay 28 2019, 11:47 AM

The idle tracking is only in the extractor process, not in baloo_file itself. On startup it runs the UnindexedFileIndexer and iterates all the folders looking for files to re-index, consuming a considerable amount of CPU time, spending most of its time doing regexp matching, mime type determination, and date time processing. Only after that it may run the extractor process when there's new files to be indexed.
So I think starting baloo_file later is safe since it checks all the files anyway? Otherwise/additionally, we should look into making the UnindexedFileIndexer start delayed.

poboiko added a subscriber: poboiko.EditedMay 29 2019, 1:25 PM

The idle tracking is only in the extractor process, not in baloo_file itself. On startup it runs the UnindexedFileIndexer and iterates all the folders looking for files to re-index, consuming a considerable amount of CPU time, spending most of its time doing regexp matching, mime type determination, and date time processing. Only after that it may run the extractor process when there's new files to be indexed.
So I think starting baloo_file later is safe since it checks all the files anyway? Otherwise/additionally, we should look into making the UnindexedFileIndexer start delayed.

I think it should be pretty safe to start baloo_file later.
The very reason to add UnindexedFileIndexer was to make sure we index those files which were changed/added when Baloo wasn't running (as well as IndexCleaner to take care of files which were removed).

UPD: I think users won't be able to change any important document while plasma is still starting anyways.

So, I can haz shipit?

bruns added a comment.May 29 2019, 2:46 PM

I would very much prefer only delaying the UnindexedFileIndexer and the IndexCleaner. These two are stopgap measures.

broulik planned changes to this revision.May 29 2019, 2:47 PM

Look into delaying the UnindexedFileIndexer and IndexCleaner. Can I just use QTimer or do you want something more sophisticated?

bruns added a comment.May 29 2019, 2:51 PM

I think everything below 10 seconds would be completely fine, is that sufficient to let the remaining ones start?

I would assume so. In the graph, which is a VM, ksplashqml is signalled to quit 2 seconds into the startup at which point plasmashell is pretty much done. So if we delay it by 5 seconds or so that could already be plenty I think. I'll give it a go next week.

broulik updated this revision to Diff 58853.May 29 2019, 3:03 PM
broulik retitled this revision from Start baloo_file later to Delay running UnindexedFileIndexer and IndexCleaner.
broulik edited the summary of this revision. (Show Details)
broulik edited the test plan for this revision. (Show Details)
  • Delay indexer
ngraham accepted this revision.May 29 2019, 4:39 PM
ngraham added a subscriber: ngraham.

Makes perfect sense to me.

This revision is now accepted and ready to land.May 29 2019, 4:39 PM

Thanks for the shipit but I'd like @bruns to have the final word on this

bruns accepted this revision.May 30 2019, 11:37 AM

Haven't actually checked it, but I trust you it still works ;-)

This revision was automatically updated to reflect the committed changes.