Handle adjacent special characters correctly
ClosedPublic

Authored by bruns on Apr 2 2018, 5:14 PM.

Details

Summary

The code handled sequences like '((' incorrectly, i.e. this was parsed as
a single opening quote, and thus could get operator association wrong.
Although only '>=' and '<=' have a special meaning, also accept '==' and
':=' as '=' resp. ':'.

BUG: 392620

Test Plan
38: QDEBUG : AdvancedQueryParserTest::testNestedParentheses(a OR ((b AND c) AND d))   result term [ OR ( : a (QString)) [ AND ( : b (QString)) ( : c (QString)) ( : d (QString)) ] ]
38: QDEBUG : AdvancedQueryParserTest::testNestedParentheses(a OR ((b AND c) AND d)) expected term [ OR ( : a (QString)) [ AND ( : b (QString)) ( : c (QString)) ( : d (QString)) ] ]
38: PASS   : AdvancedQueryParserTest::testNestedParentheses(a OR ((b AND c) AND d))

Diff Detail

Repository
R293 Baloo
Branch
b#392620_fix_operator_evaluation
Lint
No Linters Available
Unit
No Unit Test Coverage
bruns created this revision.Apr 2 2018, 5:14 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptApr 2 2018, 5:14 PM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
bruns requested review of this revision.Apr 2 2018, 5:14 PM
michaelh requested changes to this revision.Apr 2 2018, 8:07 PM

It would be great if you could add tests for ==, := and ((...)...) to autotests/unit/lib/advancedqueryparsertest.cpp

src/lib/advancedqueryparser.cpp
35 ↗(On Diff #31161)

Suggestion: Rename to tokenList. It's less likely to confuse it with token that way.

50 ↗(On Diff #31161)

Please use !token.isEmpty()
(Don't exactly know why, but have myself been told so several times. Maybe it's faster. :-)

54 ↗(On Diff #31161)

Please use QLatin1Char

55 ↗(On Diff #31161)

// Parentheses delimit tokens, and are tokens by themselves

61 ↗(On Diff #31161)

Please use QLatin1Char

63 ↗(On Diff #31161)

Please use !token.isEmpty()

68 ↗(On Diff #31161)

Please use QLatin1Char

69 ↗(On Diff #31161)

':' means Term::Contains and '=' means Term::Equal so ':=' is a little ambigous.
Unless you have a particular reason to interpret ':=' as '=' we should take it as ':'.
It may be better to have and extra If for '=' and ':' and simply drop any second "=" so we come out with ':' or '=' as token.

The parse() function checks the second char only in case of '<' or '>'. So ':=' will become ':'. The
2-char tokens should to be added to the switch in parse() which then also could be simplified. The distinction between '>' and '>=' is the lexer's job, right?

This revision now requires changes to proceed.Apr 2 2018, 8:07 PM
bruns added inline comments.Apr 2 2018, 9:21 PM
src/lib/advancedqueryparser.cpp
55 ↗(On Diff #31161)

Why choose a different wording here? Also, the important aspect here is the "end", which commits the current token to the list.

69 ↗(On Diff #31161)

":=" and "==" are added as is, and the parser interprets both dependent on the first character only, i.e. ":" and "=".
The behaviour for these two combinations is unchanged with this patch.

The lexer should not handle ">" and ">=" differently, both are (valid) tokens, and should be returned as such.

Adding e.g. ">=" to the switch statement is not possible, as it works on QChar's.

michaelh added inline comments.Apr 3 2018, 6:16 PM
src/lib/advancedqueryparser.cpp
55 ↗(On Diff #31161)

Because of opening and closing parentheses, but you're right 'end' is the important aspect here.

69 ↗(On Diff #31161)
  1. I misunderstood your description as ':=' should become '='.

The lexer should not handle ">" and ">=" differently, both are (valid) tokens, and should be returned as such.

That is what I meant. I was confused by parse() lexing again when encountering a '>'.

Adding e.g. ">=" to the switch statement is not possible, as it works on QChar's.

I don't understand. token is QString. Why not instead of switch (token.at(0).toLatin1())

switch (token) {
case '>'`: comparator = Term::Greater; break;
case '>='`: comparator = Term::GreaterEqual; break;

What am I missing?

bruns added inline comments.Apr 4 2018, 4:28 AM
src/lib/advancedqueryparser.cpp
69 ↗(On Diff #31161)

I don't understand. token is QString. Why not instead of switch (token.at(0).toLatin1())

switch (token) {
case '>'`: comparator = Term::Greater; break;
case '>='`: comparator = Term::GreaterEqual; break;

What am I missing?

You can only switch on integral statements (ints, chars, ...). With C++11, you can do some constexpr hacks, but your code won't work as is.

michaelh added inline comments.Apr 4 2018, 8:25 AM
src/lib/advancedqueryparser.cpp
69 ↗(On Diff #31161)

You can only switch on integral statements (ints, chars, ...).

Heh? That's surprising![1] Anyway, thanks for the lesson a keep this code as it is now.

[1]
Coming from javascript switch definitely is a "false friend".
Can I become a beef steak?

bruns marked 8 inline comments as done.Apr 4 2018, 9:50 PM
michaelh added a comment.EditedApr 5 2018, 9:41 AM

Apart from QLatin1Char and token.isEmpty() we're done, I think.
I asked on IRC about the importance of QLatin1Char the answer was "it depends", QLatin1Char is conservative the code as-is more concise. So, change it or leave it, your decision.

bruns updated this revision to Diff 31550.Apr 7 2018, 3:06 AM

token.size() > 0 -> !token.isEmpty()
renamed tokens to tokenList

Use testcase

bruns marked 6 inline comments as done.Apr 7 2018, 3:07 AM
bruns edited the test plan for this revision. (Show Details)
bruns updated this revision to Diff 31636.Apr 8 2018, 1:01 AM
bruns added a task: Restricted Maniphest Task.

rebase

bruns removed a task: Restricted Maniphest Task.Apr 8 2018, 1:09 AM
michaelh accepted this revision as: michaelh.Apr 8 2018, 10:07 AM
michaelh accepted this revision.
This revision is now accepted and ready to land.Apr 8 2018, 10:07 AM
This revision was automatically updated to reflect the committed changes.