Port the search interface from QRegExp to QRegularExpression
AbandonedPublic

Authored by ahmadsamir on Aug 25 2019, 9:19 PM.

Details

Reviewers
dhaumann
cullmann
Group Reviewers
KTextEditor
Summary
  • Remove the kateregexp class:
    • Move isMultiLine() and repairPattern() to kateregexpsearch; merge them into one function
    • Dot '.' character will match any character except a newline by default
  • Explicitly enable QRegularExpression::MultilineOption, for the rationale see the comment at the top of KateRegExpSearch::search()
  • KateCommands::SedReplace::InteractiveSedReplacer::InteractiveSedReplacer() was updated to use QRegularExpression, but not tested
  • Adjust the extended context menu used in the find box:
  • Tweak the relevant unit tests (searchbar_test, regexpsearch_test)
Test Plan

All unit tests pass, except for vimode_emulatedcommandbar; and bug313759
which seg faults.

Reguarl expression search in kate works. Should be tested thoroughly
before commiting it.

Diff Detail

Repository
R39 KTextEditor
Branch
l-qregularexpression (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23188
Build 23206: arc lint + arc unit
ahmadsamir created this revision.Aug 25 2019, 9:19 PM
Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptAug 25 2019, 9:19 PM
ahmadsamir requested review of this revision.Aug 25 2019, 9:19 PM

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

It's exactly what it says, \s can match a new line char in pcre.

Why do you think this is a bad idea?

Hi, without any further look at the code changes, I don't think an behavior change like "\s can match a newline" is a good idea.
Or do I misunderstand that?

It's exactly what it says, \s can match a new line char in pcre.

Why do you think this is a bad idea?

Because before it didn't do that in KTextEditor, or?
That means all people that got used to the current behavior will see this as an regression, if it is not optional.

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

Actually, it is not faster.
If you take a look at the code, for single line regex, it iterates over the individual lines.
For multi line regexes, it will first concatinate all lines into one buffer.
For large files that is "very" slow.
And if you e.g. search + hit then "next match", this will be done again and again.

But given it only happens more often for stuff containing \s, I assume that should be not that problematic, thought not sure if the behavior change is that good.

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

That is true, perhaps we should add this as extra into the menu as proposal, like \s/...

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

;=) That is really no good reasoning why one changes a behavior.
It is clear that if you port something over to a new class, behavior might change, but that doesn't make it a good thing per default.

On the other side, I see you did a lot of testing, that is highly appreciated.

I will think a bit more about this patch.

As you seems to have now played a bit with this part of the code, are you interested in test out the still not merged https://phabricator.kde.org/D19367 change?

Maybe they'll also see it as ktexteditor/kate using a regex engine that matches what the abundance of online pcre docs say, and how other editors that use pcre behave?

IIUC, '\s' was workedaround so as not to match a newline so that the search pattern wouldn't be considered multiline (isMultiLine() function), which makes findAll and replaceAll slower as it took longer, v.s. just matching against each line separately.

Actually, it is not faster.
If you take a look at the code, for single line regex, it iterates over the individual lines.
For multi line regexes, it will first concatinate all lines into one buffer.
For large files that is "very" slow.
And if you e.g. search + hit then "next match", this will be done again and again.

I was mainly talking about find/replaceAll operations; qregularexpression is quite fast, I dabbled with using a global match and doing a findAll in one go, that was fast, but the code got complicated quite fast too. As I found out, ktexteditor wants the matches fed back to it one by one, since it has to do a lot of other stuff: highlighting, replacing text, undo history, buffer stuff, moving ranges... etc.
[..]

The thing is, what kateregexp did was replace '\s' with '[ \t]', which users who want this behaviour can easily use.

That is true, perhaps we should add this as extra into the menu as proposal, like \s/...

There's only so many menu entries that can be added, new users will have to read the docs at some point, regex is a complicated minefield.

Technically it's a whole new class, QRegularExpression, some different behaviours are sort of expected...

;=) That is really no good reasoning why one changes a behavior.
It is clear that if you port something over to a new class, behavior might change, but that doesn't make it a good thing per default.

True. But I also meant, that would be a good time to introduce new behaviours, as long as they are sane and adhere more to pcre standard behaviour. pcre documentation is impressive and with probably many guides floating around the internet, deviating from what the documentation says is potentially more annoying/frustrating to users.

[..]

As you seems to have now played a bit with this part of the code, are you interested in test out the still not merged https://phabricator.kde.org/D19367 change?

I'll see what I can do.

Any news here?

Any news here?

For starters, this needs to be rebased; I'll try and get to it soon.

ahmadsamir updated this revision to Diff 71059.EditedDec 7 2019, 3:33 PM

Rebase

EDIT: I should probably format the added code with clang-format, I'll do that soon...

clang-fromat the code in this diff

Was anything of the previously commented issues addressed?

Personally, I am not convinced replacing \s with [ \t], and deviating from PCRE default behaviour is a good idea in this case.

Personally, I am not convinced replacing \s with [ \t], and deviating from PCRE default behaviour is a good idea in this case.

That is not to be discussed, and Christoph tried to explain this already before.

In short: If these issues don't get fixed to be compatible with previous behavior, this patch will not get merged. I'd be very much in favor of having the compatibility, since QRegularExpression is the way forward, but adding regressions is a big no-go and not up for discussion.

We are maintaining Kate for close to 20 years now. We'd like to ask you to give us some trust in our decision.

Disagreement doesn't always mean distrust.

ahmadsamir updated this revision to Diff 71617.Dec 15 2019, 6:38 PM

Address comments

ahmadsamir updated this revision to Diff 71618.Dec 15 2019, 6:40 PM
ahmadsamir retitled this revision from Port regex search to QRegularExpression to Port the search interface from QRegExp to QRegularExpression.
ahmadsamir edited the summary of this revision. (Show Details)
ahmadsamir edited the test plan for this revision. (Show Details)

Verbatim commit message

  • Rebase
  • Match the old code and check if the pattern is empty, this lets the vimode_emulatedcommandbar unit test pass
ahmadsamir updated this revision to Diff 76852.Mar 3 2020, 2:05 PM
ahmadsamir edited the summary of this revision. (Show Details)

Modify the extended context menu:

  • QRegularExpression supports fixed-length positive/negative lookbehind
  • With QRegularExpression/PCRE, \x escape sequences must/should be wrapped in curly braces
ahmadsamir abandoned this revision.Mar 16 2020, 2:07 PM