[KFileMetaData] Add extractor for generic XML and SVG
ClosedPublic

Authored by bruns on Oct 28 2018, 5:39 PM.

Details

Summary

Currently, both XML and SVG documents are indexed as plain text due
to mimetype inheritance. This fills the content index with meaningless
data (tags, attributes, attribute values ...).

Use QDomElement::text() for generic XML documents and <text/> nodes
for SVG to extract the content. Also try do find Dublin Core metadata
and add the relevant properties.

Depends on D16488

Diff Detail

Repository
R286 KFileMetaData
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Oct 28 2018, 5:39 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptOct 28 2018, 5:39 PM
Restricted Application added subscribers: Baloo, kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Oct 28 2018, 5:39 PM
astippich added inline comments.Oct 31 2018, 9:20 PM
src/extractors/xmlextractor.cpp
25

seems unused

28

same

72

Can we agree on using a static qstringlist if the mimetypes are fixed? I prepared a patch to convert all other extractors to this scheme when applicable in D16554

Also, using an initializer list is faster, see https://www.angrycane.com.br/en/2018/06/19/speeding-up-cornercases/

astippich requested changes to this revision.Nov 1 2018, 10:23 AM
This revision now requires changes to proceed.Nov 1 2018, 10:23 AM
bruns updated this revision to Diff 44645.Nov 1 2018, 3:45 PM
bruns marked an inline comment as done.

Reorder mimetype check to use SVG(+XML) special case if applicable and use
generic implementation for all other mimetypes inheriting from
application/xml.
Use static QStringList for mimetypes.

bruns marked 2 inline comments as done.Nov 1 2018, 3:59 PM
bruns added inline comments.
src/extractors/xmlextractor.cpp
25

qCDebug(...) ...

49

^ here

astippich accepted this revision.Nov 1 2018, 4:18 PM
astippich added inline comments.
src/extractors/xmlextractor.cpp
25

I think that gets pulled in via "kfilemetadata_debug.h", but nevermind

This revision is now accepted and ready to land.Nov 1 2018, 4:18 PM
This revision was automatically updated to reflect the committed changes.