Ignore more types of source files
ClosedPublic

Authored by ngraham on May 9 2018, 6:47 PM.

Details

Summary

Add more types of development-related files to the exclusion lists. These files aren't useful to index, and having them there can bog down Baloo.

BUG: 394002
BUG: 390932
CCBUG: 382117
FIXED-IN 5.47

Test Plan

Created a bunch of files of the newly excluded types. Baloo didn't index them.

Diff Detail

Repository
R293 Baloo
Branch
more-excluded-source-files (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
ngraham created this revision.May 9 2018, 6:47 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptMay 9 2018, 6:47 PM
Restricted Application added subscribers: Baloo, kde-frameworks-devel. · View Herald Transcript
ngraham requested review of this revision.May 9 2018, 6:47 PM
ngraham edited the summary of this revision. (Show Details)May 9 2018, 6:57 PM
cfeck added a subscriber: cfeck.May 9 2018, 9:20 PM
cfeck added inline comments.
src/file/fileexcludefilters.cpp
147

,

ngraham updated this revision to Diff 33912.May 9 2018, 9:29 PM

add missing comma

ngraham marked an inline comment as done.May 9 2018, 9:30 PM
ngraham updated this revision to Diff 33915.May 9 2018, 10:26 PM

Revert unintentional change

ngraham updated this revision to Diff 33919.May 9 2018, 11:38 PM

Add more to also fix 39093

ngraham edited the summary of this revision. (Show Details)May 9 2018, 11:39 PM
ngraham edited the summary of this revision. (Show Details)May 9 2018, 11:41 PM
ngraham updated this revision to Diff 33920.May 9 2018, 11:42 PM

Also omit node_packages folders

broulik added inline comments.
src/file/fileexcludefilters.cpp
69

Don't we ignore blobs already? If not, we should also add stuff like qmlc and jsc

77

qrc is a Qt resource file, not a QML file

ngraham updated this revision to Diff 34141.May 14 2018, 1:57 PM

Add more blobs

ngraham marked 2 inline comments as done.May 14 2018, 1:58 PM
ngraham added inline comments.
src/file/fileexcludefilters.cpp
69

As far as I can tell, we do not, and they have to be manually listed. I've added qmlc and jsc. Any more you can think of?

ngraham marked an inline comment as done.May 14 2018, 1:59 PM
bruns added a comment.May 14 2018, 2:29 PM

Does anyone know if there are any artifacts generated by the meson build system?

src/file/fileexcludefilters.cpp
69

Static library - .a

73

Probably Bytecode - we have .o above, which is also compiled

75

For python2, there is also .pyo (Python3 is covered by the __pycache__ directory filter)

ngraham updated this revision to Diff 34143.May 14 2018, 2:33 PM

More buildy files

bruns added inline comments.May 14 2018, 2:36 PM
src/file/fileexcludefilters.cpp
79

Thats not what I meant (I am not aware of anything generating a Bytecode file literally).
I meant changing the // Compiled files comment to // Bytecode files, which all the ones below are.

ngraham updated this revision to Diff 34145.May 14 2018, 2:40 PM

Fix misinterpretation

ngraham marked 5 inline comments as done.May 14 2018, 2:41 PM
ngraham added inline comments.
src/file/fileexcludefilters.cpp
79

Heh, oops.

ngraham marked an inline comment as done.May 14 2018, 2:43 PM

How do people feel about adding *.ini to the exclusions list?

ngraham updated this revision to Diff 34146.May 14 2018, 2:45 PM

Add some more

bruns added inline comments.May 14 2018, 3:08 PM
src/file/fileexcludefilters.cpp
149

Hm, not to sure about this one - SVG typically has RDF metadata, and also everything in <tspan> tags qualifies as "content".
Do we have a generalized XML extractor?

ngraham added inline comments.May 14 2018, 3:16 PM
src/file/fileexcludefilters.cpp
149

My impression is that Baloo is really intended for user files; SVGs only get their content indexed by accident, because they happen to be textual. I don't think there's any textual content inside an SVG file that you'd actually want to have indexed.

bruns added inline comments.May 14 2018, 3:34 PM
src/file/fileexcludefilters.cpp
149

SVGs are user files, and anything inside <tspan> is textual content. You can have several paragraphs with text inside SVGs.
We index the RDF metadata (author, title, ...) for PDFs, EPUB, ... so we should for SVG.
Of course it is pointless to index e.g. the tags itself, or the content of any non-textual tag, thats the reason I asked for an XML extractor.

ngraham updated this revision to Diff 34151.May 14 2018, 3:42 PM

Revert change to omit SVG files

ngraham marked 3 inline comments as done.May 14 2018, 3:42 PM
bruns added a comment.May 14 2018, 3:51 PM

If you want to read more about text in SVG:
http://tavmjong.free.fr/blog/

To show a generalized XML extractor is sufficient for SVG:

  • Path data: <path d="m 53.6725,131.446 58.2085,-36.2853 53.672,90.7143 -82.3983,-3.78" id="path815" />
  • Single Line: <tspan id="tspan825" x="65.7677" y="73.9941">Single line</tspan>
  • Multiline: <flowPara id="flowPara823">This is some multiline Text</flowPara>

Non-text tags are empty (i.e., are defined by attributes only).

ngraham marked an inline comment as done.May 14 2018, 5:13 PM
ngraham updated this revision to Diff 34155.May 14 2018, 5:22 PM

Omit all .map files, and also .ini files

bruns accepted this revision.May 15 2018, 2:10 PM

Not tested by me, by looks good in general.

This revision is now accepted and ready to land.May 15 2018, 2:10 PM
This revision was automatically updated to reflect the committed changes.