Port regex search to QRegularExpression
Needs ReviewPublic

Authored by ahmadsamir on Aug 25 2019, 9:19 PM.

Details

Reviewers
dhaumann
cullmann
Group Reviewers
KTextEditor
Summary
  • Do away with the kateregexp class; move isMultiLine() to kateregexpsearch
  • \s can match a newline
  • Dot '.' will match any character except a newline by default; it can be set to match a newline if QRegularExpression::DotMatchesEverythingOption is set, right now it can be set implicitly as a match directive, '(?s)', in the search pattern
  • Explicitly enable QRegularExpression::MultilineOption, more details about that are in KateRegExpSearch::search()
  • Update the relevant unit tests (searchbar_test, regexpsearch_test)
Test Plan

All unit tests pass except for vimode_emulatedcommandbar

Search away, test it for as long as possible before committing

Diff Detail

Repository
R39 KTextEditor
Branch
ahmad/qregularexpression (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 15635
Build 15653: arc lint + arc unit
ahmadsamir created this revision.Aug 25 2019, 9:19 PM
Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptAug 25 2019, 9:19 PM
ahmadsamir requested review of this revision.Aug 25 2019, 9:19 PM

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

It's exactly what it says, \s can match a new line char in pcre.

Why do you think this is a bad idea?

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

It's exactly what it says, \s can match a new line char in pcre.

Why do you think this is a bad idea?

Because before it didn't do that in KTextEditor, or?
That means all people that got used to the current behavior will see this as an regression, if it is not optional.

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

Actually, it is not faster.
If you take a look at the code, for single line regex, it iterates over the individual lines.
For multi line regexes, it will first concatinate all lines into one buffer.
For large files that is "very" slow.
And if you e.g. search + hit then "next match", this will be done again and again.

But given it only happens more often for stuff containing \s, I assume that should be not that problematic, thought not sure if the behavior change is that good.

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

That is true, perhaps we should add this as extra into the menu as proposal, like \s/...

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

;=) That is really no good reasoning why one changes a behavior.
It is clear that if you port something over to a new class, behavior might change, but that doesn't make it a good thing per default.

On the other side, I see you did a lot of testing, that is highly appreciated.

I will think a bit more about this patch.

As you seems to have now played a bit with this part of the code, are you interested in test out the still not merged https://phabricator.kde.org/D19367 change?

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

Actually, it is not faster.
If you take a look at the code, for single line regex, it iterates over the individual lines.
For multi line regexes, it will first concatinate all lines into one buffer.
For large files that is "very" slow.
And if you e.g. search + hit then "next match", this will be done again and again.

I was mainly talking about find/replaceAll operations; qregularexpression is quite fast, I dabbled with using a global match and doing a findAll in one go, that was fast, but the code got complicated quite fast too. As I found out, ktexteditor wants the matches fed back to it one by one, since it has to do a lot of other stuff: highlighting, replacing text, undo history, buffer stuff, moving ranges... etc.
[..]

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

That is true, perhaps we should add this as extra into the menu as proposal, like \s/...

There's only so many menu entries that can be added, new users will have to read the docs at some point, regex is a complicated minefield.

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

;=) That is really no good reasoning why one changes a behavior.
It is clear that if you port something over to a new class, behavior might change, but that doesn't make it a good thing per default.

True. But I also meant, that would be a good time to introduce new behaviours, as long as they are sane and adhere more to pcre standard behaviour. pcre documentation is impressive and with probably many guides floating around the internet, deviating from what the documentation says is potentially more annoying/frustrating to users.

[..]

As you seems to have now played a bit with this part of the code, are you interested in test out the still not merged https://phabricator.kde.org/D19367 change?

I'll see what I can do.