balooctl: Add clean option to remove stale file index entries
AbandonedPublic

Authored by smithjd on Mar 5 2018, 3:16 AM.

Details

Reviewers
michaelh
Group Reviewers
Baloo
Frameworks

Diff Detail

Repository
R293 Baloo
Branch
master-purgeDb (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
smithjd created this revision.Mar 5 2018, 3:16 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptMar 5 2018, 3:16 AM
smithjd requested review of this revision.Mar 5 2018, 3:16 AM
michaelh added a subscriber: michaelh.EditedMar 5 2018, 8:59 AM

Baloo is in bad need for something like this. Unfortunately in cannot be done so easily.
We have to account for indexed net shares and removable drives which are only temporarily unavailable. Files on those should not be removed. And most likely some user interaction is needed here.

Baloo has it own Project page now. Please have a look and file your plans for Baloo as tasks there. Also I'm very much interested in your opinion about the tasks filed there specially about T8054 and T7860.

src/engine/transaction.cpp
264

This should go to stderr so only pruned files go to stdout

270

Place this at the end of the loop and also print url, please. E.g. out << "Removed" << id << ":" << url; would be easily parseable. So one could catch stdout with

balooctl pruneDb | tr '-d:' -f2 >removed_files.lst
271

Can we or should we check the result of the operation here?

276

Same as above

src/tools/balooctl/main.cpp
90

Just 'prune'?

326

see above

329

stderr

334

stderr

smithjd updated this revision to Diff 28747.Mar 5 2018, 7:21 PM

Review changes.

smithjd marked 7 inline comments as done.Mar 5 2018, 7:35 PM

All mounts must be manually made available by the user before running this option, or all files on a previously available mount will be removed from the index.

src/engine/transaction.cpp
271

removeDocument() is void.

michaelh requested changes to this revision.Mar 7 2018, 9:43 AM

All mounts must be manually made available by the user before running this option, or all files on a previously available mount will be removed from the index.

It is much too easy for users to accidentally ruin their database with this. There should at least be a warning message including the advice to mount all external items. And users must confirm execution.
This command also should have a --dry-run option to show users what would happen to their database.

This revision now requires changes to proceed.Mar 7 2018, 9:43 AM

All mounts must be manually made available by the user before running this option, or all files on a previously available mount will be removed from the index.

It is much too easy for users to accidentally ruin their database with this. There should at least be a warning message including the advice to mount all external items. And users must confirm execution.
This command also should have a --dry-run option to show users what would happen to their database.

Adding paths to index on top of the user's home directory requires manual modification of the config file, or symlinks into the user's home directory. It can be assumed that the user is reasonably competent about destructive options if he/she does this. Adding a --dry-run modifier is overkill for regeneratable data. Requiring that the user confirm execution is bad practice.

I've changed the option description to more clearly convey that it will remove any entry that doesn't have a resolveable file path.

smithjd updated this revision to Diff 28987.Mar 8 2018, 6:05 AM
  • balooctl: Clarify the prune option description.
smithjd retitled this revision from balooctl: Add pruneDb option to remove stale file index entries. to balooctl: Add prune option to remove stale file index entries.Mar 8 2018, 6:05 AM

It's crashing for me.

Here's the backtrace:

Thread 1 "balooctl" received signal SIGSEGV, Segmentation fault.
0x00007ffff770ccbe in Baloo::WriteTransaction::removeDocument (this=0x0, id=4294969360) at /home/otto/kde/src/frameworks/baloo/src/engine/writetransaction.cpp:113
113         DocumentDB documentTermsDB(m_dbis.docTermsDbi, m_txn);
(gdb) bt
#0  0x00007ffff770ccbe in Baloo::WriteTransaction::removeDocument (this=0x0, id=4294969360) at /home/otto/kde/src/frameworks/baloo/src/engine/writetransaction.cpp:113
#1  0x00007ffff7708472 in Baloo::Transaction::pruneFsTree (this=0x7fffffffd250) at /home/otto/kde/src/frameworks/baloo/src/engine/transaction.cpp:266
#2  0x000000000040cd59 in main (argc=2, argv=0x7fffffffdb08) at /home/otto/kde/src/frameworks/baloo/src/tools/balooctl/main.cpp:334
smithjd updated this revision to Diff 29184.Mar 10 2018, 6:56 PM
  • Lessen chattiness.
  • balooctl: Clarify the prune option description.
  • Fix document purging.
ASSERT: "m_writeTrans" in file /home/otto/kde/src/frameworks/baloo/src/engine/transaction.cpp, line 233

Thread 1 "balooctl" received signal SIGABRT, Aborted.
0x00007ffff524f9eb in raise () from /lib64/libc.so.6
smithjd updated this revision to Diff 29280.Mar 11 2018, 8:50 PM
  • Open the transaction in read-write mode.
[New Thread 0x7fffec72b700 (LWP 10669)]
ASSERT failure in Transaction: "Permission denied", file /home/otto/kde/src/frameworks/baloo/src/engine/transaction.cpp, line 54

Thread 1 "balooctl" received signal SIGABRT, Aborted.
0x00007ffff524f9eb in raise () from /lib64/libc.so.6

Please have a look at D11213 and consider joining your patch with it.
This diff brought me to the idea because I'm still convinced users would appreciate a more fine grained control over the data removal (at least I would). balooctl list -s is essentially the --dry-run option I wanted to see in this patch.

If you want to break your db for testing: here's a recipe T7860. Don't forget to backup ~/.local/share/baloo/index ;-)

src/engine/transaction.cpp
269 ↗(On Diff #28987)

You need to commit() here, otherwise changes will be lost.

src/tools/balooctl/main.cpp
328 ↗(On Diff #28987)

Database::ReadWriteDatabase

smithjd updated this revision to Diff 29290.Mar 11 2018, 9:47 PM
  • Commit the changes.

I was going to expose this functionality on the DBus interface and integrate it into the check command, but the lack of external mount enumeration makes this a little risky at the moment. It's probably better to at least provide a simple global prune option. I personally don't see a need for a more finer-grained approach.

This comment was removed by smithjd.
michaelh added a comment.EditedMar 18 2018, 8:37 AM

Sorry I took so long to respond. I needed to get the tools ready to see whats going on (D11285 and D11287) .
Wrt this patch

  • You forgot to open the database in ReadWrite mode.
  • To test your code I changed that locally. After that a lot of Q_ASSERTs got in the way. I gave up after the third.
  • I wonder how you can run this. Please make sure you build the debug target. Comment out the respective Q_ASSERTs and mark them with a FIXME.

Please incorporate this patch into D11285 and D11287. It separates the concerns of controlling indexing and manipulating the database. Also database manipulation is experimental while balooctl stuff is not. Also some of those preventing Q_ASSERT's are already deactivated there.

It would be great if we could test the effect of this command. I have no idea how this can be done. Maybe you have one?

I was going to expose this functionality on the DBus interface and integrate it into the check command, but the lack of external mount enumeration makes this a little risky at the moment.

I've been playing around with mount enumeration in the sanitizer class. It's feasable. Wait for a patch coming up soon.

Personal note:
Even after dealing with this patch for some time I still get confused by the terms 'prune' and 'purge'. This may be my personal problem or a general problem for non-native english-speakers. I don't know. For me it would be easier if this command would be called just clean.

smithjd updated this revision to Diff 29879.Mar 19 2018, 3:36 AM

Change the 'prune' option to 'clean'.
The database must open in ReadWrite mode.

smithjd retitled this revision from balooctl: Add prune option to remove stale file index entries to balooctl: Add clean option to remove stale file index entries.Mar 19 2018, 3:37 AM

Sorry I took so long to respond. I needed to get the tools ready to see whats going on (D11285 and D11287) .
Wrt this patch

  • You forgot to open the database in ReadWrite mode.
  • To test your code I changed that locally. After that a lot of Q_ASSERTs got in the way. I gave up after the third.
  • I wonder how you can run this. Please make sure you build the debug target. Comment out the respective Q_ASSERTs and mark them with a FIXME.

Works for me.

Please incorporate this patch into D11285 and D11287. It separates the concerns of controlling indexing and manipulating the database. Also database manipulation is experimental while balooctl stuff is not. Also some of those preventing Q_ASSERT's are already deactivated there.

I'd like to let the dust settle on the baloodb tool first. Once it's in git I'll change this patch to use it.

It would be great if we could test the effect of this command. I have no idea how this can be done. Maybe you have one?

I was going to expose this functionality on the DBus interface and integrate it into the check command, but the lack of external mount enumeration makes this a little risky at the moment.

I've been playing around with mount enumeration in the sanitizer class. It's feasable. Wait for a patch coming up soon.

Personal note:
Even after dealing with this patch for some time I still get confused by the terms 'prune' and 'purge'. This may be my personal problem or a general problem for non-native english-speakers. I don't know. For me it would be easier if this command would be called just clean.

Fixed.

  • I wonder how you can run this. Please make sure you build the debug target. Comment out the respective Q_ASSERTs and mark them with a FIXME.

Works for me.

Hmm,...

ASSERT: "!p.name.isEmpty()" in file /home/otto/baloo/src/engine/documenturldb.cpp, line 143
*** Programm hat Signal SIGABRT (Aborted) empfangen ***

After commenting that out:

ASSERT: "!p.name.isEmpty()" in file /home/otto/baloo/src/engine/documenturldb.cpp, line 205
*** Programm hat Signal SIGABRT (Aborted) empfangen ***

After commenting that out it seems to work, but:

$ balooctl clean
...
Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
Purged 60129544208 //Kareshi Kanojo no Jijou
ASSERT: "!path.name.isEmpty()" in file /home/otto/baloo/src/engine/documenturldb.h, line 115
*** Programm hat Signal SIGABRT (Aborted) empfangen ***

After commenting that out:

$ balooctl clean
...
Purged 168397082036537349 //Little Witch Academia - 08.mkv
Empty filename passed to function
Purged 168428194779629573 
Empty filename passed to function
Purged 215610373326243845 
Empty filename passed to function
Purged 215614303221319685 
*** Normal beendet ***

Second run:

$ balooctl clean
Empty filename passed to function
Purged 148657949215033349 
*** Normal beendet ***

Do you have an idea, why the Q_ASSERTs get in the way on my system, but not on yours?

speakers. I don't know. For me it would be easier if this command would be called just clean.

Fixed.

Thank you.

D11529 hooks up the index cleaner class. I've modified it so that it skips unmounted directories. This patch could probably be dropped, though some form of integrity checker that ignores mount points might still be useful for baloodb.

smithjd abandoned this revision.Mar 28 2018, 12:16 AM