Update baloo database when data returned by the KFileMetaData extractors is modified
Open, Needs TriagePublic

Description

Update baloo database when data returned by the KFileMetaData extractors is modified.

mgallien created this task.Feb 25 2018, 3:20 PM
mgallien moved this task from Backlog to Diagnostic on the Baloo board.

Maybe I did not understand you correctly. In my view except for extended attributes you have to change the file (write to it) to change properties which can be extracted by the extractors. Baloo uses KNotify to watch those changes on the files and updates its databases accordingly. Do I miss something here?

Under which circumstances did you observe baloo not updating? Maybe baloo does not do this reliably, I don't know.

Maybe I did not understand you correctly. In my view except for extended attributes you have to change the file (write to it) to change properties which can be extracted by the extractors. Baloo uses KNotify to watch those changes on the files and updates its databases accordingly. Do I miss something here?

Under which circumstances did you observe baloo not updating? Maybe baloo does not do this reliably, I don't know.

When you modify KFileMetaData extractors, the baloo database still contains the set of properties returned by the old extractors. In other words, with the same file content, the properties are different.
I did corrections and modifications in KFileMetaData. Later, I realized that the users did not reap the benefits because of that.

This comment was removed by michaelh.
michaelh added a comment.EditedFeb 27 2018, 1:10 PM

I did corrections and modifications in KFileMetaData. Later, I realized that the users did not reap the benefits because of that.

Now I get it. I see no way how this can be done (remember I'm still in the process of trying to understand what baloo does and why. Also I'm far from being confident in using C++).
The best way currently would be to suggest rebuilding the database in the commit message(s). Did you have a look at baloo's bug list for crashes? (Nearly) all I'm doing right now is to learn how to mitigate those.

Anyway I'm open to ideas how to do that, I just don't have any myself yet.

ngraham added a subscriber: ngraham.Mar 2 2018, 9:13 PM
bruns added a subscriber: bruns.Mar 31 2018, 12:17 AM

Possible approach:

  1. add a version to each extractor
  2. save the used extractor version
    • on the database level -and-
    • on the directory level -or-
    • on the file level

So after login, the baloo_file_extractor compares the version of installed and last used extractors, and if there is any change, starts to scan the filesystem.
After each directory has been completed, the version for the directory is updated, and after all directories are done, the version in the database is updated.

bruns added a comment.Oct 28 2018, 4:41 PM

Proposal:

  1. track extractor version on the database level
  2. store tuples of {mimetype, extractor} -> version
  3. version is noted as major.minor

Everytime a bug is fixed, minor is increased
Everytime an extractor is changed in a way which affects extracted contents, major is increased
If a change affects only specific mimetypes, only the corresponding {mimetype, extractor} is updated

The indexer compares versions of installed extractors with the versions stored in the db. In case the version has changed, the affected files (determined by mimetype) are marked for reindexing.