Improve a bit parse error diagnosis
ClosedPublic

Authored by jfita on May 1 2020, 1:08 PM.

Details

Summary

Currently, the parse error message for a query like

SELECT select #1/1a/2020#;

is

org.kde.kdb.core: Parse error: tokens left globalCurrentPos: 18 sql.length(): 19 globalToken: ";"
org.kde.kdb.core: error:
org.kde.kdb.core: at character  18  near tooken  ";"

and Kexi, in SQL text mode, moves the cursor next to the semicolon.

The query is indeed incorrect, but the error message is misleading
because the error is in the “a” within the date literal, not the
terminating semicolon.

This is due the lack of a “fallback” rule within the DATE_OR_TIME start
condition and, because it is an exclusive condition, Flex provides its
default rule that returns nothing to the parser. Therefore, the
globalCurrentPos is not incremented and, at the end, KDbParser can
detect something went wrong, but can not pinpoint the error’s correct
location at that stage.

I added a new rule that simply returns any unrecognized character as a scan
error to the parser so that it can detect the erroneous input and present a
more accurate position of where the error is. With that change Kexi moves the
cursor next to the “a”.

I also added the same rule to the INITIAL start condition to match any
character, because otherwise it also reports the error at the position of the
semicolon in queries like the following:

SELECT 1' FROM input;

When the unexpected character is one used for quoting (i.e., ', ", or `), the
message avoid quoting the unexpected message because it is harder to read the
error depending on the font.

There was, however, another issue with Flex’s action for its default
rule: it calls the ECHO macro that, by default, is defined as write the
matched input to stdout.

This is not a problem for Kexi — in fact, i am unable to see that write
in the console —, but it messes up the output of SqlParseTest. This is
particularly important when called with the ‘-xml’ or ‘-tap’ parameters,
that need to maintain their strict format or then QtCreator or Perl’s
prove can not parse the output.

FIXED-IN:3.2.1

Diff Detail

Repository
R15 KDb
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
jfita created this revision.May 1 2020, 1:08 PM
Restricted Application added a project: KDb. · View Herald TranscriptMay 1 2020, 1:08 PM
Restricted Application added a subscriber: Kexi-Devel-list. · View Herald Transcript
jfita requested review of this revision.May 1 2020, 1:08 PM
staniek added inline comments.May 1 2020, 10:15 PM
src/parser/KDbSqlScanner.l
165

Why it is "end" now when we're calling BEGIN() here?

170

Very good, how about also adding a comment here to explain all readons behind this rule, as you nicely did in the description?

392

This is nice but how about keeping the

[\~\!\@\#\^\&\|\`\?,()\[\]\.;\:\+\-\*\/\%\^\<\>\=] { ... }

which just defines all single-character tokens and adding the . { } afterwards?
In the . {} we may also add a debug explaining that the yytext[0] is unsupported.
Doing so at flex level can have its value too.

jfita updated this revision to Diff 81715.May 2 2020, 12:03 AM

Add comment to fallback rules that now report an actual error

jfita added inline comments.May 2 2020, 12:12 AM
src/parser/KDbSqlScanner.l
165

BEGIN is to switch to a new start condition: after this flex will only look for rules that are within the group DATE_OR_TIME_caller, in this case the INITIAL or empty start condition. That is, it will look all the rules that have *no* prefix, hence this is the end of the date/time “subparser”.

In other words, this is the rule of the '#' character that ends the date/time literal.

392

Yes, it is indeed way better because now i can return an actual scanning error and the error message that the end user sees is “Unexpected character literal: x”, that is much nicer than the “Query error” i was getting before.

Thank you.

jfita updated this revision to Diff 81716.May 2 2020, 12:13 AM

Remove the unrelated change for indentifier escaping

jfita updated this revision to Diff 81717.May 2 2020, 12:18 AM
jfita edited the summary of this revision. (Show Details)

Fix summary

staniek requested changes to this revision.May 2 2020, 11:28 AM
staniek added inline comments.
src/parser/KDbSqlScanner.l
165

Thanks

174

Something easier to translate and understand for end users would be better. E.g.

"Unexpected character \"%1\" in date/time"

src/parser/generated/sqlscanner.cpp
1463

"Unexpected character \"%1\""

would be more consistent

This revision now requires changes to proceed.May 2 2020, 11:28 AM
jfita added inline comments.May 2 2020, 1:48 PM
src/parser/KDbSqlScanner.l
174

The problem i had quoting the unexpected character is that, depending on the font settings, it can be very hard to spot what is the offending character when it is a symbol like ' or ". For example, in the following screenshot i have problems seeing the ' character in "'":

Having the character alone at the end removes that kind of problems:

Do you want me to keep the quotes around the character?

staniek added inline comments.May 3 2020, 12:20 PM
src/parser/KDbSqlScanner.l
174

That's a good note.

How about doing the same as a compiler shows:

when there's a quote it shows just it, e.g:'

error: missing terminating " character

else, it shows like

error: unknown type name 'a'

So we could do:

const QString string(QString::fromUtf8(yytext, yyleng));
QString char(string.left(1));
if (char == QLatin1Char('\'') || char == QLatin1Char('"') || char == QLatin1Char('`')) {
} else {
  char = QString("\"%1\").arg(char);
}

setError(KDbParser::tr("Unexpected character %1 in date/time").arg(char));
jfita updated this revision to Diff 81827.May 3 2020, 8:31 PM
jfita retitled this revision from Improve a bit parse error dignosis to Improve a bit parse error diagnosis.
jfita edited the summary of this revision. (Show Details)

Change error message to be easier to translate and to understand, taking care that quoting characters are not quoted

jfita added inline comments.May 3 2020, 8:38 PM
src/parser/KDbSqlScanner.l
174

So we could do:

const QString string(QString::fromUtf8(yytext, yyleng));
QString char(string.left(1));
if (char == QLatin1Char('\'') || char == QLatin1Char('"') || char == QLatin1Char('`')) {
} else {
 char = QString("\"%1\").arg(char);
}

I’ve put that code into a function so that i could reuse in both error conditions and changed the message to what you suggested.

staniek added inline comments.May 9 2020, 8:54 PM
src/parser/KDbSqlScanner.l
40

{ in new line

Use clang-format if unsure:

staniek requested changes to this revision.May 9 2020, 8:54 PM
This revision now requires changes to proceed.May 9 2020, 8:54 PM
jfita updated this revision to Diff 82467.May 10 2020, 9:03 PM

Fixed style issue

jfita added inline comments.May 10 2020, 9:06 PM
src/parser/KDbSqlScanner.l
40

Sorry, did not know i could run clang-format on flex sources. Will use it next time.

Do you have the correct .clang-format file, though? I tried clang-format with WebKit-based style but it resulted in massive changes to this file and other C++ sources i tried.

Thank you.

staniek accepted this revision.May 10 2020, 9:51 PM
staniek added inline comments.
src/parser/KDbSqlScanner.l
40

Sorry, we should have it in the repo, I thing we never added it after splitting repositories.

This is what I use at the moment:

BasedOnStyle:  WebKit
Language:        Cpp
AccessModifierOffset: -4
ConstructorInitializerIndentWidth: 4
AlignEscapedNewlinesLeft: false
AlignTrailingComments: false
AlignAfterOpenBracket: Align
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AllowShortFunctionsOnASingleLine: false
AlwaysBreakTemplateDeclarations: false
AlwaysBreakBeforeMultilineStrings: false
BreakBeforeBraces: Linux
BreakBeforeBinaryOperators: true
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: true
BinPackParameters: true
ColumnLimit:     100
ConstructorInitializerAllOnOneLineOrOnePerLine: true
#DerivePointerBinding: false
#ExperimentalAutoDetectBinPacking: false
IndentCaseLabels: false
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCSpaceBeforeProtocolList: true
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 60
PenaltyBreakString: 1000
PenaltyBreakFirstLessLess: 120
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PointerBindsToType: false
SpacesBeforeTrailingComments: 1
Cpp11BracedListStyle: false
Standard:        Cpp11
IndentWidth:     4
TabWidth:        8
UseTab:          Never
BreakBeforeBraces: Linux
IndentFunctionDeclarationAfterType: false
SpacesInParentheses: false
SpacesInAngles:  false
SpaceInEmptyParentheses: false
SpacesInCStyleCastParentheses: false
#SpaceAfterControlStatementKeyword: true
SpaceBeforeAssignmentOperators: true
ContinuationIndentWidth: 4

Configure it this way

I realized quite recently there was added some automation: https://phabricator.kde.org/D24568 but I'd prefer not to alter so many lines just for formatting.

If we look af KF5 repos I think only few publish official clang-format, e.g. kactivities.git/.clang-format.

I think settings I pasted above reflect it as well as https://community.kde.org/Policies/Frameworks_Coding_Style not too loosely.

Regarding clang-format for flex files, it's not 100% supported but the code we discuss sits between %{ and %} and that seems to get formatted properly using clang-format if you select correct block.

This revision is now accepted and ready to land.May 10 2020, 9:51 PM
This revision was automatically updated to reflect the committed changes.