[ExtractorCollection] Use only best matching extractor plugin
ClosedPublic

Authored by bruns on Nov 1 2018, 6:10 PM.

Details

Summary

Currently, if there is a direct match for a mime type, only the matching plugin(s)
is (are) returned, inheritance is not considered. Otherwise, all matches
for all mime ancestors are returned, even if one plugin is a much better
match than any other. Change the inheritance case to only return well matching
extractors.

Up until now, the only extractor selected by inheritance was the plain
text extractor, so there was one fuzzy matched plugin at most. With the
XML extractor, the XML extractor should be preferred over the plain test
extractor.

Depends on D16592

Diff Detail

Repository
R286 KFileMetaData
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Nov 1 2018, 6:10 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptNov 1 2018, 6:10 PM
Restricted Application added subscribers: Baloo, kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Nov 1 2018, 6:10 PM
astippich added inline comments.Nov 2 2018, 7:53 PM
src/extractorcollection.cpp
163

nitpick: extra space before "for"

164

doesn't returning here cause immediate return after the first matching extractor for the mimetype has been found? that doesn't fit to the first paragraph of the summary.
"Otherwise, all matches for all mime ancestors are returned, even if one plugin is a much better match than any other." sounds to me that all plugins of all ancestors are returned.

bruns edited the summary of this revision. (Show Details)Nov 2 2018, 9:21 PM
bruns marked 2 inline comments as done.
bruns added inline comments.
src/extractorcollection.cpp
163
164

doesn't returning here cause immediate return after the first matching extractor for the mimetype has been found? that doesn't fit to the first paragraph of the summary.

Yes, and this is the intended change - best matching. We don't want text/plain if we have a better one.

astippich accepted this revision.Nov 2 2018, 9:32 PM
astippich added inline comments.
src/extractorcollection.cpp
163

I meant the code, not the output :)

164

That's what I figured, the summary was a little bit misleading.

This revision is now accepted and ready to land.Nov 2 2018, 9:32 PM
This revision was automatically updated to reflect the committed changes.
bruns marked 2 inline comments as done.

This or the commit before causes CI to fail the ExtractorCollectionTest on Qt 5.9, 5.10, and Windows

bruns added a subscriber: bcooksley.EditedNov 4 2018, 3:06 PM

This or the commit before causes CI to fail the ExtractorCollectionTest on Qt 5.9, 5.10, and Windows

Most likely the new check in D16593. It fails to find an extractor for application/x-kvtml

Apparently it either does not find "application/x-kvtml" at all, or it is not registered as subtype of application/xml.

This may be a setup issue of the CI system - @bcooksley ?

Currently it relies on the application/x-kvtml entry from mime/packages/kde5.xml, which is part of kcoreaddons. Should be possible to switch to another entry if this is not available.

As noted in the build log on the CI system, kfilemetadata doesn't depend on kcoreaddons so you therefore cannot rely on the kde5.xml file being available:

[kf5-qt5 SUSEQt5.9] Running shell script
+ python3 -u ci-tooling/helpers/prepare-dependencies.py --product Frameworks --project kfilemetadata --branchGroup kf5-qt5 --environment production --platform SUSEQt5.9 --installTo /home/jenkins//install-prefix/
Retrieving: Frameworks-extra-cmake-modules-kf5-qt5
Retrieving: Frameworks-ki18n-kf5-qt5
Retrieving: Frameworks-karchive-kf5-qt5

The CI system is working correctly in this case and as designed, only allowing actual dependencies of a project to be used during it's build and testing.

bruns added a comment.Nov 5 2018, 5:10 PM

As noted in the build log on the CI system, kfilemetadata doesn't depend on kcoreaddons so you therefore cannot rely on the kde5.xml file being available:

[kf5-qt5 SUSEQt5.9] Running shell script
 + python3 -u ci-tooling/helpers/prepare-dependencies.py --product Frameworks --project kfilemetadata --branchGroup kf5-qt5 --environment production --platform SUSEQt5.9 --installTo /home/jenkins//install-prefix/
 Retrieving: Frameworks-extra-cmake-modules-kf5-qt5
 Retrieving: Frameworks-ki18n-kf5-qt5
 Retrieving: Frameworks-karchive-kf5-qt5

The CI system is working correctly in this case and as designed, only allowing actual dependencies of a project to be used during it's build and testing.

Obviously the tests require a sufficient dataset for a working QMimeDatabase. This hasn't changed, only the extent of required mime types has. Up until now, the implicitly installed definitions (shared-mime-info?) did the job.

So, which mime-types are guaranteed by the CI system to be available? Can one rely on shared-mime-info? What about Windows/macOS?

There are dozens of mime types in shared-mime-info which are sub-class-of application/xml, see:
grep -E 'mime-type |sub-class-of.*application/xml' /usr/share/mime/packages/freedesktop.org.xml | grep -B1 sub-class-of | head

You can rely on shared-mime-info being present on all platforms (Linux, FreeBSD, Windows, Android) yes.

bruns added a comment.Nov 6 2018, 9:08 PM

Fixed ...