Balooctl: Deindex unfound files with check command.
AbandonedPublic

Authored by smithjd on Mar 20 2018, 8:40 PM.

Details

Reviewers
vhanda
michaelh
broulik
bruns
Group Reviewers
Baloo

Diff Detail

Repository
R293 Baloo
Branch
master-filedeindexer (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2187
Build 2205: arc lint + arc unit
smithjd created this revision.Mar 20 2018, 8:40 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptMar 20 2018, 8:40 PM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
smithjd requested review of this revision.Mar 20 2018, 8:40 PM
smithjd retitled this revision from Balooctl: Deindex unfound files with check command. Deindexing is skipped for file paths containing an unmounted mount point. to Balooctl: Deindex unfound files with check command..Mar 20 2018, 8:54 PM
smithjd updated this revision to Diff 30173.Mar 22 2018, 1:50 AM

Fix and optimize.

smithjd updated this revision to Diff 30466.Mar 25 2018, 4:39 AM
  • Change check command description.
michaelh added a subscriber: mgallien.EditedMar 25 2018, 11:28 AM

Under the premise that I still have to learn about baloo's inner workings, here are some concerns:

  • I'm not convinced, that index cleaning should be part of the dbus interface. Why should cleaning be done out of process? Do we really want any application to trigger database manipulation? Could you elaborate your rationale please?
  • Which application is using baloo dbus interface anyway and which functions? @mgallien How is elisa using it?
  • Purging stale entries is by far not enough to get the index in a good shape. As soon as removable drives or network shares are involved all sorts of weird things can happen. IndexCleaner is much too simple to account for that.
  • Looks like indexcleaner has been dead code until now. Do we need to be concerned about Vishesh's commit message? https://cgit.kde.org/baloo.git/commit/?id=ea2afe88b0c4299d7540e5b6c8b8e46858336f0c
src/file/fileindexscheduler.cpp
166

Same as above

233

Same as above

235

Same as above

src/file/fileindexscheduler.h
89

Nitpick: We're removing db entries not files. Also every file is deindexable. Maybe 'purgeStaleEntries', 'removeLostEntries' or so would be a better name to describe what's going on.

114

Same as above

src/file/indexcleaner.cpp
44–45

Unrelated whitespace change

81

You can use QStringLiteral("%1/").arg(device.mountPath()) here and maybe define it outside the loop

src/tools/balooctl/main.cpp
85

Something like "Update the index. Remove stale entries and add new files." would describe the actions taken more clearly.

196

Do purging before indexing.
Sometimes the parent id of a file is lost (Why?) and they appear as //foo.bar if that is deleted first it can be reindexed.

Under the premise that I still have to learn about baloo's inner workings, here are some concerns:

  • I'm not convinced, that index cleaning should be part of the dbus interface. Why should cleaning be done out of process? Do we really want any application to trigger database manipulation? Could you elaborate your rationale please?

IndexCleaner fits the current implementation of checking for file changes quite well.

  • Which application is using baloo dbus interface anyway and which functions? @mgallien How is elisa using it?
  • Purging stale entries is by far not enough to get the index in a good shape. As soon as removable drives or network shares are involved all sorts of weird things can happen. IndexCleaner is much too simple to account for that.

IndexCleaner is simple and can handle 99.9% of file deindexing requirements, and is now safe for use in combination with unmounted indexed volumes. IndexCleaner is a vital MIA part of Baloo. An additional purger could conceivably be useful in certain cicumstances, but given that the index is regeneratable data, I don't know that it's even warranted. This could be a good use for baloodb, leaving the generic automatic part to Baloo.

As above, this already neatly fits into the workflow.

smithjd updated this revision to Diff 30573.Mar 25 2018, 11:56 PM
smithjd marked 9 inline comments as done.
  • Review changes.
src/file/fileindexscheduler.h
89

This is doing the opposite of indexing new files, which is removing unfound files from the index. I think deindex is an appropriate term for this.

src/tools/balooctl/main.cpp
85

This is probably already clear enough. check isn't designed to be used all that often; it's already run at kde startup.

smithjd updated this revision to Diff 30575.Mar 26 2018, 2:13 AM
  • Fix a threading issue.
  • Fix check operation order.
smithjd updated this revision to Diff 30589.Mar 26 2018, 6:19 AM
  • Optimize and make the loop actually work.
  • Change check command description.
  • Review changes.
  • Fix a threading issue.
  • Fix check operation order.
  • Use a single StorageDevices object.
michaelh added inline comments.
src/file/fileindexerconfig.cpp
63 ↗(On Diff #30589)
smithjd added inline comments.Mar 27 2018, 9:19 PM
src/file/fileindexerconfig.cpp
63 ↗(On Diff #30589)

No, making the StorageDevices object static so it can be exported.

broulik requested changes to this revision.Mar 28 2018, 8:47 AM
broulik added inline comments.
src/file/fileindexerconfig.cpp
63 ↗(On Diff #30589)

Please create the StorageDevices object on demand, otherwise this penalizes all users of that class that just want to know whether Baloo itself is enabled, which was the main reason for doing D11279

This revision now requires changes to proceed.Mar 28 2018, 8:47 AM
smithjd updated this revision to Diff 30804.Mar 28 2018, 6:32 PM
  • Optimize and make the loop actually work.
  • Change check command description.
  • Review changes.
  • Fix a threading issue.
  • Fix check operation order.
  • Use a single StorageDevices object.
  • Partially revert the last diff, restoring StorageDevices creation on demand.
smithjd added inline comments.Mar 28 2018, 6:46 PM
src/file/fileindexerconfig.cpp
63 ↗(On Diff #30589)

I had presumed that the optimization came from re-using the already existing cache instead of recreating it, rather than creating the SolidDevices object on demand.

smithjd marked an inline comment as done.Mar 31 2018, 4:27 AM

I'm still not convinced this is a sustainable approach.

  1. The user has no control over the process
  2. Symlinks pointing to unmounted devices will be removed
  3. Database management is partially exposed on dbus
  4. Depending on user's setup this covers only ~80% (my system) of inconsistencies
  5. It is good to have a tool like this for diagnostic purposes, but the question is: Why do stale entries exist in the first place?
  6. Regenerating data can be very expensive (several hours) see 1)

Because of those concerns I prefer to do cleaning with baloodb (D11753) which is designed to give the user maximal control.
I'm not requesting changes because I don't want to stand in the way. Others have to decide on this diff.

smithjd updated this revision to Diff 31136.Apr 1 2018, 11:22 PM
  • Ignore the symlink target, only remove the index entry if the link is removed.
smithjd updated this revision to Diff 34892.May 26 2018, 12:55 AM
  • Uniformly use storageDevices.
Restricted Application edited subscribers, added: kde-frameworks-devel; removed: Frameworks. · View Herald TranscriptMay 26 2018, 12:55 AM
smithjd updated this revision to Diff 35204.May 30 2018, 5:43 PM
  • New DBus method updateIndex().
  • Change for D13216.
smithjd updated this revision to Diff 40346.Aug 24 2018, 3:26 AM
  • New DBus method updateIndex().
  • Document indexing options.
  • Prevent a scheduler race condition.
  • Run de-indexing file check in parallel.
  • Rebase for separate statechanged functors.
bruns added a subscriber: bruns.Aug 24 2018, 1:16 PM
bruns added inline comments.
src/file/indexcleaner.cpp
76

Undesired detach

82

Undesired detach

bruns requested changes to this revision.Aug 24 2018, 1:16 PM
This revision now requires changes to proceed.Aug 24 2018, 1:16 PM
smithjd added inline comments.Aug 24 2018, 7:56 PM
src/file/indexcleaner.cpp
76

This example will also detach with or without a const intermediate.

...returning a copy from a method means that the copy is shared and has negative side-effects when used with range-based loops. - https://www.dvratil.cz/2015/06/qt-containers-and-c11-range-based-loops

82

Same as above.

bruns added inline comments.Aug 24 2018, 8:54 PM
src/file/indexcleaner.cpp
76

No, if done correctly, it will not detach.
For iterating, you need a readonly copy of the container. Using const auto gurantees you will not modify the container. Calling begin() on a const Qt container will not detach.

Try to understand the linked article.

smithjd updated this revision to Diff 40394.Aug 25 2018, 3:03 AM
  • New DBus method updateIndex().
  • Document indexing options.
  • Prevent a scheduler race condition.
  • Rebase for separate statechanged functors.

Re-base. Drops setting a runnable priority.

smithjd abandoned this revision.Oct 9 2018, 2:47 AM