[Syndication] Port QRegExp to QRegularExpression
ClosedPublic

Authored by ahmadsamir on Jan 5 2020, 1:31 PM.

Details

Summary

"(" and ")" are treated literally when inside [], no need to escape them.

LoaderUtil::parseFeed():

  • format the code on multiple lines for easier readability.
  • restore the regex (line 69) to that from commit dda15b0aaddec01, basically replacing "(?)" in the pattern with (?:HREF). For more details see the discussion in the diff review.

htmlToPlainText(): make the regex pattern non-greedy otherwise it may
match the whole subject string.

Test Plan

make && ctest

Diff Detail

Repository
R96 PIM: Syndication Support
Branch
l-qregularexpression (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20658
Build 20676: arc lint + arc unit
ahmadsamir created this revision.Jan 5 2020, 1:31 PM
Restricted Application added a project: KDE PIM. · View Herald TranscriptJan 5 2020, 1:31 PM
ahmadsamir requested review of this revision.Jan 5 2020, 1:31 PM
ahmadsamir added a comment.EditedJan 5 2020, 1:35 PM

Some of these regex patterns are too complex, (some parts I don't even know what they do e.g. "(?)" in loaderutil.cpp, line 64); so I made sure that they are all valid by calling QRegularExpression::isValid() on them.

I look for usage of tagRegExp() with lxr.kde.org and couldn't find any.

dfaure requested changes to this revision.Jan 5 2020, 5:49 PM

I look for usage of tagRegExp() with lxr.kde.org and couldn't find any.

A file-static function is not exported outside that cpp file, so that's quite expected.

src/loaderutil.cpp
69 ↗(On Diff #72795)

Indeed that (?) looks invalid. kregexpeditor says so, at least.

@mlaurent in commit dda15b0aaddec01 you moved code around, but you also replaced (?:HREF) with (?), while the rest of this regexp didn't change. Any reason for that?

I think that either HREF should be added back, or (?) should be removed.

src/tools.cpp
159 ↗(On Diff #72795)

Any reason for the added ? (for non-greedy-match)? I can't think of a case where it would make a difference, here.

This revision now requires changes to proceed.Jan 5 2020, 5:49 PM
ahmadsamir added inline comments.Jan 5 2020, 7:43 PM
src/loaderutil.cpp
69 ↗(On Diff #72795)

The pattern is valid according to QRegularExpression::isValid(), and it seems to match sort of on word boundaries like \b and \B... weird :)

src/tools.cpp
159 ↗(On Diff #72795)

Greedy matching would match:

<h1 style="display: block" some title in the the badly formatted header</h1>

so, it'll remove the whole string from < to the last >.

ahmadsamir updated this revision to Diff 72825.EditedJan 5 2020, 7:52 PM
ahmadsamir edited the summary of this revision. (Show Details)

Replace (?) with (?:HREF)

Edit: I could be wrong, so. :)

mlaurent added inline comments.Jan 6 2020, 12:40 PM
src/loaderutil.cpp
69 ↗(On Diff #72795)

Perhaps I made an error on regexp.
Autotest still works ?
if yes it's ok.

I extracted code for testing it (testing current bug) so if all is ok...

ahmadsamir added inline comments.Jan 6 2020, 12:44 PM
src/loaderutil.cpp
69 ↗(On Diff #72795)

Yes, all unit tests still pass.

ahmadsamir added a comment.EditedJan 7 2020, 9:47 AM

I look for usage of tagRegExp() with lxr.kde.org and couldn't find any.

A file-static function is not exported outside that cpp file, so that's quite expected.

Noted. And ping.

dfaure accepted this revision.Jan 22 2020, 8:14 AM
This revision is now accepted and ready to land.Jan 22 2020, 8:14 AM
This revision was automatically updated to reflect the committed changes.