Add "image/svg" as Type::Image to the BasicIndexingJob
ClosedPublic

Authored by bruns on Dec 3 2018, 6:48 PM.

Diff Detail

Repository
R293 Baloo
Branch
submit
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5680
Build 5698: arc lint + arc unit
bruns created this revision.Dec 3 2018, 6:48 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptDec 3 2018, 6:48 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Dec 3 2018, 6:48 PM

Is indexing the contents of SVG images actually desirable? The fact that these files are internally represented by text rather than binary data is an implementation detail, and their contents are generally not useful since they're not intended to be human-readable.

bruns added a comment.Dec 3 2018, 8:53 PM

Is indexing the contents of SVG images actually desirable? The fact that these files are internally represented by text rather than binary data is an implementation detail, and their contents are generally not useful since they're not intended to be human-readable.

Twice wrong ;-):

  1. This is just the document type, so it would be even useful for filename only searches
  2. The xmlextractor I have added recently to KFileMetaData handles Dublin Core metadata (e.g. Author, Keywords, ...) and text spans (i.e. content) in SVG.
  1. This is just the document type, so it would be even useful for filename only searches

Cool, that seems fine.

text spans (i.e. content) in SVG.

Is indexing any of this actually useful? I worry that it would be just noise in >99% of cases.

bruns added a comment.Dec 3 2018, 9:07 PM

text spans (i.e. content) in SVG.

Is indexing any of this actually useful? I worry that it would be just noise in >99% of cases.

Why should this be noise? Can you clarify?

text spans (i.e. content) in SVG.

Is indexing any of this actually useful? I worry that it would be just noise in >99% of cases.

Why should this be noise? Can you clarify?

This text in an SVG file is an implementation detail, not user content like the metadata in a JPEG. For example, here's the textual content of a random Breeze icon:

https://cgit.kde.org/breeze-icons.git/tree/icons/devices/64/computer.svg

For example would it make sense to find SVG files when searching for "layer" or "matrix" or "connector"?

If we have a way to index only the genuine metadata (author, keywords, etc) but omit the text that comprises the icon's internal representation, that would work. Otherwise I think this would just add a lot of useless data to the DB.

bruns added a comment.Dec 3 2018, 9:46 PM

text spans (i.e. content) in SVG.

Is indexing any of this actually useful? I worry that it would be just noise in >99% of cases.

Why should this be noise? Can you clarify?

This text in an SVG file is an implementation detail, not user content like the metadata in a JPEG. For example, here's the textual content of a random Breeze icon:

https://cgit.kde.org/breeze-icons.git/tree/icons/devices/64/computer.svg

No, thats the raw data, not text spans, e.g. <text>Some Text</text>.

See e.g. https://suchanek.name/programs/powerline/intro/3.svg

The content is e.g. "LaTex with PowerLine"

ngraham accepted this revision.Dec 3 2018, 9:58 PM

Ah, great! Makes sense then, sorry for being dense. :)

This revision is now accepted and ready to land.Dec 3 2018, 9:58 PM
This revision was automatically updated to reflect the committed changes.