Postscript files currently fall back to the plaintext extractor due to
mimetype inheritance. This adds lots of garbage to the index and misses
any useful data in the DSC comments.
Details
- Reviewers
astippich ngraham poboiko - Group Reviewers
Frameworks Baloo - Commits
- R286:205ed84ee213: [KFileMetaData] Add extractor for DSC conforming (Encapsulated) Postscript
make && ctest
Diff Detail
- Repository
- R286 KFileMetaData
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
This is not Postscript parsing, but DSC parsing - read the specification to understand the difference!
An EPS file is also a PostScript file, and indeed ghostscript opens it perfectly fine.
Because of the above, libspectre perfectly handles EPS files, and the API already provides all the information that the current DscExtractor provides as well.
Please tone down your attitude to something more respectful, thanks.
- also **
Because of the above, libspectre perfectly handles EPS files, and the API already provides all the information that the current DscExtractor provides as well.
Its an additional dependency (libpectre and libgs), and also imposes a security risk - see e.g. https://nvd.nist.gov/vuln/detail/CVE-2018-11645
Please tone down your attitude to something more respectful, thanks.
You started with "Ugh ..." - you comment lacks any respect ...
Sure: that is a reason more to handle it like that.
Because of the above, libspectre perfectly handles EPS files, and the API already provides all the information that the current DscExtractor provides as well.
Its an additional dependency (libpectre and libgs),
- libspectre is a C library, and uses only libgs
- the rest of the ghostscript dependencies are already used in one way or another in an average KDE installation
- okular & cantor already use libspectre for a very long time (okular a decade)
and also imposes a security risk - see e.g. https://nvd.nist.gov/vuln/detail/CVE-2018-11645
There are way worse CVEs in lower components of a modern Linux stack (say in the Linux kernel).
Also, according to that, should we drop the support in okular for PostScript files, and the support in cantor for EPS images?
Please tone down your attitude to something more respectful, thanks.
You started with "Ugh ..." - you comment lacks any respect ...
Certainly this was not the case, sorry if it was not intended. But even then, that was geared towards the code, and that does not remotely justify attacking the person with "you did not read the code" (which is wrong).
It may be an option to use libspectre as basis for an additional extractor, but this should not be the default, see below.
This is meant to be as simple as possible - the extractor itself is hardly more than 30 lines of code. There is definitely a use case for this extractor.
and also imposes a security risk - see e.g. https://nvd.nist.gov/vuln/detail/CVE-2018-11645
There are way worse CVEs in lower components of a modern Linux stack (say in the Linux kernel).
Also, according to that, should we drop the support in okular for PostScript files, and the support in cantor for EPS images?
There is a difference between opening a file consciously and letting it happen by chance. The extractor is run when a file is hovered by the mouse cursor or by baloo. It will be executed without the user being aware of it.
IMHO the ghostscript support should be disabled (runtime) by default in Okular, until it is run completely sandboxed.
Please tone down your attitude to something more respectful, thanks.
You started with "Ugh ..." - you comment lacks any respect ...
Certainly this was not the case, sorry if it was not intended. But even then, that was geared towards the code, and that does not remotely justify attacking the person with "you did not read the code" (which is wrong).
The code clearly states it targets (E)PS DSC. A full blown PS interpreter may be able to extract more info from the file, but not without the mentioned drawbacks. Blankly stating using libspectre is better and should be used, without weighting pros and cons, does not give the impression (to me) you have evaluated it carefully.
Apology accepted.
Please answer why you consider running a full blown postscript interpreter in an uncontrolled environment (no sandboxing, runs without user interaction) is better than 20 code lines of trivial text parsing.
Please be patient, I've got other things that currently take my time, and I need to properly get some data to satisfty your questions.
In the meanwhile, two suggestions:
- please try to see also other people's point of point, and not just yours; if I suggested something, then it's because, according to my knowledge/experience (which includes also maintaining okular in the past, FYI), I deem it the optimal way
- when asking for people's opinion, again, try to use a more friendly attitude; that will certainly help getting my attention faster, rather than feeling attacked because I'm expressing technical doubts on this solution
Thats completely fine for me, if you need ore time, just say so.
In the meanwhile, two suggestions:
- please try to see also other people's point of point, and not just yours; if I suggested something, then it's because, according to my knowledge/experience (which includes also maintaining okular in the past, FYI), I deem it the optimal way
- when asking for people's opinion, again, try to use a more friendly attitude; that will certainly help getting my attention faster, rather than feeling attacked because I'm expressing technical doubts on this solution
Not answering can also be considered impolite. I am trying to solve a real problem here. Postscript files currently pose a significant problem for baloo. I listed technical reasons why libspectre/gs is not a good solution.
@pino - you have not answered for a week.
You have set this to "Needs Revision", which removes it from the "Needs Review" queue for everyone else.
If you wan't to see an extractor based on libspectre, thats fine, but then you have to write it.
Apart from trivial comment, this looks fine. I've tested it on my setup (with bunch of (e)ps files), and randomly chosen files seems to be indexed nicely. It also reduced the size of the index by almost 50MB, because those are not indexed as plaintext anymore :)
Yet I would also vote for replacing it (eventually) with a full-featured extractor based on libspectre
(I'm not a security specialist in any way, but that CVE doesn't look too harmful, and from my point of view it's not worth to abandon full support of (E)PS because of it)
autotests/postscriptdscextractortest.cpp | ||
---|---|---|
28 | This, and QDebug, seems to be not used anymore. |
It seems like you've pushed something that was not intended to be pushed (XML extractor parts)