Perform checks for unindexed files and stale index entries on startup
ClosedPublic

Authored by poboiko on Oct 4 2018, 12:14 PM.

Details

Summary

Right now, if a file was moved / removed while baloo_file was not running
(a common use-case for those who have dualboot), there is no way baloo will
become aware of such change (apart from running balooctl check by hand).

There will be stale (if a file was removed) or invalid (pointing to non-existing file,
if a file was moved) index entries, they will pop up if user performs search, which is
frustrating. I suggest performing those checks on baloo_file startup.

Test Plan
  1. balooctl stop && echo "hello world" > ~/new-file && balooctl start
  2. after some time: balooshow ~/new-file, and see if it's indexed

Diff Detail

Repository
R293 Baloo
Branch
startup-checks (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3483
Build 3501: arc lint + arc unit
poboiko created this revision.Oct 4 2018, 12:14 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptOct 4 2018, 12:14 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
poboiko requested review of this revision.Oct 4 2018, 12:14 PM
poboiko updated this revision to Diff 42886.Oct 4 2018, 7:31 PM

Fixed typos

ngraham accepted this revision.Oct 5 2018, 11:39 PM
ngraham added a subscriber: ngraham.

Works great, thanks for all these Baloo patches! They are very much appreciated. Code looks good. As with the others, please commit for Frameworks 5.52.

While you're cranking away at these, might I suggest a few others that are sorely needed:

This revision is now accepted and ready to land.Oct 5 2018, 11:39 PM

Works great, thanks for all these Baloo patches! They are very much appreciated. Code looks good. As with the others, please commit for Frameworks 5.52.

While you're cranking away at these, might I suggest a few others that are sorely needed:

I can look at these, shouldn't be too hard.

I guess this link also should be added to previous two :)

As for symlinked folders - I once looked into, a year ago or so. As far as I remember, those can be really tricky (but I don't really remember all the details):

  • There might be duplicated entries in index pointing to same file, which doesn't look good - if we index the same folder twice (symlink and original; but I guess we can always rely on resolved path, which should be unique - apart from hard links?)
  • There might be recursive symlinks / loops, which should be handled (i.e. dir1/subdir -> dir2 and dir2/subdir -> dir1/; but it looks like QDirIterator can handle symlink loops?)
  • There might be a symlink to a folder, which has some parent excluded from indexing
  • ...and maybe more

I'll see if there is something not easily manageable.

bruns added a subscriber: bruns.Oct 6 2018, 11:24 AM

As for symlinked folders - I once looked into, a year ago or so. As far as I remember, those can be really tricky (but I don't really remember all the details):

  • There might be duplicated entries in index pointing to same file, which doesn't look good - if we index the same folder twice (symlink and original; but I guess we can always rely on resolved path, which should be unique - apart from hard links?)
  • There might be recursive symlinks / loops, which should be handled (i.e. dir1/subdir -> dir2 and dir2/subdir -> dir1/; but it looks like QDirIterator can handle symlink loops?)
  • There might be a symlink to a folder, which has some parent excluded from indexing
  • ...and maybe more

    I'll see if there is something not easily manageable.

Adding to all the points you already mentioned, I am also against following symlinks:

  • if a symlink points inside a hierarchy already included, there is nothing gained
  • if a symlink points outside, it can be included explicitly using includeFolders

Prior to any implementation attempt I would like to see a specific use case for following symlinks, which is not already covered.

ngraham added a comment.EditedOct 6 2018, 10:01 PM

I've found the relevant bug: https://bugs.kde.org/show_bug.cgi?id=333678

Hah, looks like you pushed a commit for it last year!

I'm not wedded to symlink support, but if we're not going to do it, we should close the bug with some good reasons.

poboiko closed this revision.Oct 8 2018, 10:26 PM

Hah, looks like you pushed a commit for it last year!

I'm not wedded to symlink support, but if we're not going to do it, we should close the bug with some good reasons.

Well, I remembered that I did something on that, but I totally forgot what exactly. That's how I've found this bug - lurking through commit log :)

Adding to all the points you already mentioned, I am also against following symlinks:

  • if a symlink points inside a hierarchy already included, there is nothing gained
  • if a symlink points outside, it can be included explicitly using includeFolders

    Prior to any implementation attempt I would like to see a specific use case for following symlinks, which is not already covered.

So, it seems like it is not just about whether we should follow symlinks when indexing or not. There are apparently also some troubles with "Search From Here" in Dolphin...
I agree - I think we should start with looking what are the possible use cases for symlinks, and which of them work (or not). If the situation can be improved cheaply (i.e. without changing DB schema :), then it would be nice

smithjd added a subscriber: smithjd.Oct 9 2018, 3:03 AM

Does this run at startup? If so, this will erase the entries of files on a removable volume not already mounted.

Does this run at startup? If so, this will erase the entries of files on a removable volume not already mounted.

Whoa, thanks for the notice. Not cool, forgot about it.
Probably need to add some checks in IndexCleaner, about the device.

But how does it work now? I mean, do files from not mounted devices pop up as valid search results?

...And that's why we committed this for 5.52. :-) Now we have a month to fix it--and we should make sure we do!

https://phabricator.kde.org/D11529 was already up for review, implemented the index cleaner and checked for removeable volumes before removing index entries. Exporting the storagedevices object was required: https://phabricator.kde.org/D15047. The alternative, implementing a path lookup function is here: https://phabricator.kde.org/D15843.

bruns added a comment.Oct 11 2018, 3:21 PM

https://phabricator.kde.org/D11529 was already up for review, implemented the index cleaner and checked for removeable volumes before removing index entries. Exporting the storagedevices object was required: https://phabricator.kde.org/D15047. The alternative, implementing a path lookup function is here: https://phabricator.kde.org/D15843.

All these are far from submittable, so clean your stuff up first before making any further requests.

poboiko added a comment.EditedOct 12 2018, 7:40 AM

https://phabricator.kde.org/D11529 was already up for review, implemented the index cleaner and checked for removeable volumes before removing index entries. Exporting the storagedevices object was required: https://phabricator.kde.org/D15047. The alternative, implementing a path lookup function is here: https://phabricator.kde.org/D15843.

Sorry, I was not aware of those patches (didn't do my homework). I'll look at those!

UPD: but will those help? I mean, if device is not present at the moment, Solid won't pop it up, and index entries will be forgotten.
It seems like it will help only if device is present, but not yet mounted.

https://phabricator.kde.org/D11529 was already up for review, implemented the index cleaner and checked for removeable volumes before removing index entries. Exporting the storagedevices object was required: https://phabricator.kde.org/D15047. The alternative, implementing a path lookup function is here: https://phabricator.kde.org/D15843.

Sorry, I was not aware of those patches (didn't do my homework). I'll look at those!

UPD: but will those help? I mean, if device is not present at the moment, Solid won't pop it up, and index entries will be forgotten.
It seems like it will help only if device is present, but not yet mounted.

Yes. The volume is ignored even if the storage device is unplugged from boot.

bruns added a comment.Oct 14 2018, 9:40 PM

Exercise:

  1. Take a USB memory stick
  2. plug it into one USB port
  3. run "stat <mountpoint>"
  4. unmount, plug into different port
  5. run "stat <mountpoint>" again

You will get different device ids for each port. If you plug in a different stick, the device id will be there, but contents will not match and all old entries are removed.