Port QRegExp to QRegularExpression
ClosedPublic

Authored by ahmadsamir on Dec 20 2019, 4:33 PM.

Details

Summary

Modified the regex pattern in KEmailAddress::isValidSimpleAddress()
slightly to make kemailaddresstest pass.

Also the regex pattern to check literal domains e.g. [123.123.123.123]
was wrong; now hopefully fixed. Added a couple of listeral domain rows
to the unit test.

When using \xnn escape sequences in a pattern, if more than two hex digits
are used then we must use \x{...}, see perldocs: https://perldoc.perl.org/perlre.html.
Thanks to dfaure who got Giuseppe D'Angelo (upstream QRegularExpression
author) to take a look at the code.

Test Plan

make && ctest
All unit tests pass, except for kencodingprobertest which also fails on
master.

Diff Detail

Repository
R270 KCodecs
Branch
l-qregexp (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20243
Build 20261: arc lint + arc unit
ahmadsamir created this revision.Dec 20 2019, 4:33 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptDec 20 2019, 4:33 PM
ahmadsamir requested review of this revision.Dec 20 2019, 4:33 PM
ahmadsamir planned changes to this revision.Dec 20 2019, 4:35 PM

I missed one QRegExp.

ahmadsamir updated this revision to Diff 71907.Dec 20 2019, 6:54 PM
ahmadsamir edited the summary of this revision. (Show Details)

One more QRegExp

dfaure requested changes to this revision.Dec 20 2019, 11:36 PM
dfaure added inline comments.
autotests/kemailaddresstest.cpp
345

unrelated and the empty line made sense, please revert

src/kemailaddress.cpp
631

Message from Giuseppe D'Angelo (author of QRegularExpression) : if you can depend from Qt 5.12, there's QRegularExpression::anchoredPattern instead of manual wrapping in \A and \z

We can't require 5.12 yet but this could be a TODO to simplify this code slightly at some point.

637

{,3} would be {0,3}, not {1,3}

Would this allow to minimize the amount of changes made here? I'm worried about regressions.

1117

Also from Giuseppe : the \\x0080-\\xFFFF requires changes to \\x{0080}-\\x{FFFF}

This revision now requires changes to proceed.Dec 20 2019, 11:36 PM
ahmadsamir marked 4 inline comments as done.
ahmadsamir edited the summary of this revision. (Show Details)

Address review comments

dfaure requested changes to this revision.Dec 21 2019, 12:25 PM
dfaure added inline comments.
src/kemailaddress.cpp
637

I'm confused, you still have 1,3 here, rather than 0,3

This revision now requires changes to proceed.Dec 21 2019, 12:25 PM
ahmadsamir added inline comments.Dec 21 2019, 12:46 PM
src/kemailaddress.cpp
631

Yep, I added the Qt 5.12 TODO in some QRegExp porting diffs but not all (kind of got boring (lots of QRegExp::exactMatch()), but that's not an excuse to be lazy and make it harder for the next guy who modifies that code.... :)).

637

From the unit test, this is to match literal domains, [123.123.123.123]; but the old code would match:
[.123.123.123]
[..123.123]
which seems wrong to me; as proven by this snippet in qtcreator (best way to test regex's is to just literally test them, that is more idiot-proof for me :)):

const QString str = QStringLiteral("[1.1.1.1]");
const QString str2 = QStringLiteral("[.1.1.1]");
const QString str3 = QStringLiteral("[..1.1]");

const QString pattern = QStringLiteral("\\[[0-9]{,3}(\\.[0-9]{,3}){3}\\]");
QRegExp re(pattern);

qDebug() << re.exactMatch(str); // true
qDebug() << re.exactMatch(str2); // true
qDebug() << re.exactMatch(str3); // true

So I dare say even in the old code it should have been {1,3}.

And I found that:

  • {,3} in QRegExp is equivalent to {0,3}
  • {,3} in QRegularExpression is equivalent to {1,3}
const QString str = QStringLiteral("a");
const QString oldPattern = QStringLiteral("\\d{,3}"); // QRegExp _is_ old
QRegExp oldRe(oldPattern);
qDebug() << (oldRe.indexIn(str) != -1); // true, because it's {0,3}

const QString pattern = QStringLiteral("\\d{,3}");
QRegularExpression re(pattern);
qDebug() << re.match(str).hasMatch(); // false, because it's {1,3}

(^ candidate for QRegExp (its docs literally mention {,n}) to QRegularExpression porting notes in QRegularExpression docs, if you can pass that to Giuseppe D'Angelo :)).

637

I added two more rows to the literal domain test code and found that, my code is wrong too:

addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){1,3}([0-9]{1,3})\\]");
is wrong, it should be:
addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){3}([0-9]{1,3})\\]");
                                            ^^^

I'll upload a new diff.

637

I had replied to your comment but didn't "submit" my comments (stooopid, sorry).

1117

Thank him for me, and thank you for asking him to take a look at it :D; from https://perldoc.perl.org/perlre.html:

Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.

so, disaster averted.

Sorry, I did reply to your inline comments but didn't "submit" the replies only "save draft"ed.

ahmadsamir updated this revision to Diff 71941.Dec 21 2019, 1:01 PM

Frameworks min. required Qt has just been raised to 5.12

dfaure accepted this revision.Dec 24 2019, 9:31 AM

Thanks for the extensive research!

src/kemailaddress.cpp
608

[pre-existing: this bool is never set to true....]

643

A bit weird that this comma isn't at the end of the previous line instead :)

This revision is now accepted and ready to land.Dec 24 2019, 9:31 AM
ahmadsamir updated this revision to Diff 72139.Dec 24 2019, 9:56 AM
ahmadsamir marked an inline comment as done.
ahmadsamir edited the summary of this revision. (Show Details)

Fix coding style

This revision was automatically updated to reflect the committed changes.
ahmadsamir added a comment.EditedJan 3 2020, 4:44 PM

FTR; in one of my inline comments I said:

And I found that:

  • {,3} in QRegExp is equivalent to {0,3}
  • {,3} in QRegularExpression is equivalent to {1,3}

The second part is actually wrong (as I was told by dfaure who was told by upstream QRegularExpression author); in QRegularExpression {,3} would be treated as a literal string; you must specify the first number before "," in {a,b} expressions; so e.g.:
{0,3}
{1,3}
{2,3}

It's explained much better by upstream, see https://codereview.qt-project.org/c/qt/qtbase/+/285293