replace own wildcard matcher with QRegularExpression combining all wildcards
AbandonedPublic

Authored by cullmann on Aug 25 2018, 2:49 PM.

Details

Reviewers
vkrause
dhaumann
Summary

replace own wildcard matcher with QRegularExpression combining all wildcards

In addition add validator to index tool, only ? and * wildcard stuff is valid fix some syntax files that had e.g. , instead of ; and spaces mixed in remove asterisk definition, that will never match as it tried to match the file path and is anyways 'very' basic add auto-test for new regex stuff

Test Plan

make && make test still works, autotest added

Diff Detail

Repository
R216 Syntax Highlighting
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2244
Build 2262: arc lint + arc unit
cullmann created this revision.Aug 25 2018, 2:49 PM
Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptAug 25 2018, 2:49 PM
Restricted Application added subscribers: kde-frameworks-devel, kwrite-devel. · View Herald Transcript
cullmann requested review of this revision.Aug 25 2018, 2:49 PM

Rational for the static convertExtensionsToRegularExpression method: Allows to nice unit tests + allows e.g. KTextEditor to use that later, too, as replacement for its own copy of the wildcard matcher code for the modelines.

cullmann updated this revision to Diff 40428.Aug 25 2018, 3:05 PM

all code is now MIT
only non-MIT parts are e.g. some generator scripts for highlighting and many highlighting data files

cullmann updated this revision to Diff 40429.Aug 25 2018, 3:12 PM

improve definitionForName code, already for matching take into account the priority
that saves some matching time and avoid later sorting

The old wildcard matcher existed for performance reasons. Did you check?

PS: When doing review requests, could you please have a nice subject, and not put very long lines into its stead? :-p Especially since this line will also appear in Davids release notes?

cullmann retitled this revision from replace own wildcard matcher with QRegularExpression combining all wildcards add validator to index tool, only ? and * wildcard stuff is valid fix some syntax files that had e.g. , instead of ; and spaces mixed in remove asterisk definition, that... to replace own wildcard matcher with QRegularExpression combining all wildcards.Aug 25 2018, 4:12 PM
cullmann edited the summary of this revision. (Show Details)

Some trivial QBenchmark stuff tells

QBENCHMARK {
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.c"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.sdfdsf"));
          m_repo->definitionForFileName(QLatin1String("ksldjfklsdjfkl.ada"));
          m_repo->definitionForFileName(QLatin1String("12323.cklsjdf"));
      }

tells:

new: between 4 and 5 msecs per iteration
old: between 5 and 6 msecs per iteration

I think the main issue in the "very old code" was, that we did a QRegExp per extension, not a combined one.

For the accessor: I actually would prefer a const QRegularExpression & as return value, but most other methods use values to hide the internals.

cullmann updated this revision to Diff 40435.Aug 25 2018, 4:35 PM

avoid to use capture groups

As Sebastian agreed to MIT, we can think about this after the 5.50 release.

Do you need the public methods in Definition outside of KSyntaxHighlighting?

In general: +1

If you want this to be in 5.50, you need to get a +2 from @vkrause and ping @dfaure to retag.

src/lib/definition.h
196 ↗(On Diff #40428)

@since 5.50 or 5.51

202 ↗(On Diff #40428)

@since 5.50 or 5.51

I had some more improvement ideas, this can wait a release.

I will add the internal stuff like the wildcard check and the removed/fixed hls.
I will not add the regex changes and any public API, I want to think about that once more first.
That isn't that urgent anyways. KTextEditor uses its own code for that and the only consumer is the definitionForName ATM.

cullmann abandoned this revision.Aug 26 2018, 10:11 AM

I fixed the hl files, I removed the stray Asterisk stuff, that was no proper HL for such files, they are more or less only normal .ini files, added .conf as end for ini.
Indexer is checked in, too.
I will do a new request for a properly redone matching for post 5.50.

Asterisk removal:
https://commits.kde.org/syntax-highlighting/8cc16753881d904c03f32c444f503d186d02343a

Indexer improvements:
https://commits.kde.org/syntax-highlighting/3d1b8dc72191de03a97f6c2a74cc5323dfff7b94