WordDetect rule: detect delimiters at the inner edge of the string
ClosedPublic

Authored by nibags on Oct 3 2019, 6:24 AM.

Details

Summary

In WordDetect rules, verify delimiter characters also on the right and left edges inside the string.

For example:

<WordDetect attribute="Keyword" String="<hello"/>

In the past, this rule was equivalent to \b<hello\b in regular expression. Now, it's equivalent to <hello\b, since < is a delimiter character.

I have checked the WordDetect rules in all definitions and and I haven't seen regressions. In the definitions elm.xml, selinux-cil.xml and selinux-fc.xml I replaced some WordDetect rules with StringDetect, since in this change they are equivalent.

Test Plan

make test

Diff Detail

Repository
R216 Syntax Highlighting
Branch
fix-worddetect
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17275
Build 17293: arc lint + arc unit
nibags created this revision.Oct 3 2019, 6:24 AM
Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptOct 3 2019, 6:24 AM
Restricted Application added subscribers: kde-frameworks-devel, kwrite-devel. · View Herald Transcript
nibags requested review of this revision.Oct 3 2019, 6:24 AM
nibags updated this revision to Diff 67238.Oct 3 2019, 6:47 AM
  • Add comment
dhaumann added a comment.EditedOct 3 2019, 11:23 AM

This looks good to me and as mentioned in D24354 WordDetect is better than RegExpr.

+1, but I'd like another review by @cullmann, @jpoelen or @vkrause.

Seems reasonable, do we need some doc updates? Or some more verbose description in the XSD?

cullmann accepted this revision.Oct 3 2019, 1:30 PM
This revision is now accepted and ready to land.Oct 3 2019, 1:30 PM

I think it's fine as is. The docbook says:

Detect an exact string but additionally require word boundaries
such as a dot <userinput>'.'</userinput> or a whitespace on the beginning
and the end of the word. Think of <userinput>\b&lt;string&gt;\b</userinput>
in terms of a regular expression, but it is faster than the rule <userinput>RegExpr</userinput>.

Imo <userinput>\b&lt;string&gt;\b</userinput> implies that if a string itself starts/ends with a \b character, then this should match as well. And given our unit tests do not show any changes, I think we are good to go.

Please commit.

nibags closed this revision.Oct 4 2019, 3:13 AM