Highlighting Indexer: list of suggestions
Needs RevisionPublic

Authored by jpoelen on Sun, Feb 18, 5:00 AM.



Proposes mergers of rules and the replacement of:

  • RegExp by StringDetect, AnyChar, RangeDetect, etc.
  • StringDetect by DetectChar or Detect2Chars.

Diff Detail

R216 Syntax Highlighting
No Linters Available
No Unit Test Coverage
jpoelen created this revision.Sun, Feb 18, 5:00 AM
Restricted Application added a project: Frameworks. · View Herald TranscriptSun, Feb 18, 5:00 AM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
jpoelen requested review of this revision.Sun, Feb 18, 5:00 AM

As quick comment: I like the idea of programmatically suggesting better usage of rules. I did not yet fine the time to look into this, so I do not know how much output this currently generates.

In any case, for the highlighting files we ship with the syntax highlighting framework, I would like to have zero error/warning output generated by the highlighting indexer (just like we have zero compiler warnings). So if this generates a lot of suggestions, would you also agree to fix the output? This could possibly touch many many files and be a lot of work :-)

dhaumann requested changes to this revision.Sat, Mar 10, 8:35 PM

Btw, current generated list finds 1658 issues: https://paste.kde.org/p7iareuxc

Not all of them are correct, for instance

zsh.xml" line 919 RegExpr candidate for  "Detect2Chars" : "%1"

I believe %1 in this case is a backreference in a dynamic context, so a change to Detect2Chars would be wrong.

Similarly, the suggestion

html-php.xml" line 223 RegExpr candidate for  "AnyChar" : "[^/><\"'\\s]"

is also not correct, since [^xyz] means match everything _except_ xyz, so the ^ symbol negates this, afaik (or am I remembering incorrectly?)...

Still, there are many valid items which certainly can be optimized. So patches for fixes welcome! In small chunks, if possible!

This revision now requires changes to proceed.Sat, Mar 10, 8:35 PM

Just for info: https://kate-editor.org/2018/03/10/improving-syntax-highlighting-files/ tries to reach a broader audience for getting patches.

Very nice idea! RegExp rules are the by far biggest cost factor, so every single one we can get rid of is good :)

I've already planned to fix all the suggestions, but I lost a lot of time updating the sql*.xml files and came across a very strange bug related to QRegularExpression by adding some tests. At the moment, I do not know if it's my tool to test, my version of Qt that is old or a real bug. Like last week my hard drive is dead, I took the opportunity to have a more recent configuration. I will make a few patches this weekend when I finished setting my new environment.

The tool: https://github.com/jonathanpoelen/vt-kate-syntax-highlighter with the -n option.
I also have a tool to generate a graph of the rules of an xml: https://github.com/jonathanpoelen/syntax-highlighting/tree/tools (graph.lua). The output is in a format readable by graphviz. The script is rudimentary and the xml parser does not work with some files (2 or 3).

@jpoelen What would be interesting is to check which optimizations are really an improvement. Because we should either get a significant speed boost (e.g. RegExpr -> WordDetect), or at least reduce memory allocations (possibly StringDetect -> Detect2Chars and DetectChar).

Did you do some testing here ?