[Extractor] Handle documents correctly where mimetype should not be indexed
ClosedPublic

Authored by bruns on Oct 16 2018, 1:19 AM.

Details

Summary

The BasicIndexingJob started from the UnindexedFileIndexer only has
the file extension based mimetype and thus can not determine if a file
should be indexed.

Remove the document only from the indexingleveldb, otherwise the
document can not be found e.g. by name or type, and the basic indexer is
run on the file again on each session start.

This is typical for e.g. xml files, which may come with various file
extensions based on the application, e.g. XMP sidecar files.

Test Plan

start balooctl monitor
save some metadata to an XMP sidecar file, e.g. from digikam
Current behavior with content indexing enabled:

  • the file is added to the index and immediately removed again.

When content indexing is switched off:

  • the file stays in the index.

After the change, the file (name, attributes) stays in the index.

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Oct 16 2018, 1:19 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptOct 16 2018, 1:19 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Oct 16 2018, 1:19 AM
bruns edited the test plan for this revision. (Show Details)Oct 25 2018, 2:57 PM
ngraham accepted this revision.Oct 25 2018, 4:51 PM
ngraham added a subscriber: ngraham.

Thanks, this seems to work well.

src/file/extractor/app.cpp
133–134

Is this FIXME still in need of fixing with your changes here?

This revision is now accepted and ready to land.Oct 25 2018, 4:51 PM
This revision was automatically updated to reflect the committed changes.