keyword rule: Spport for keywords inclusion from another language/file
ClosedPublic

Authored by jpoelen on Aug 5 2018, 5:10 PM.

Details

Summary

The goal is to reuse words from other definition files without having to copy them. For example, the list of CSS properties also in SCSS.

The presence of ## in the name of the keyword rule indicates that it is necessary to look for the list in another file (format: listName##languageName).

Test Plan

A quick test was done by removing the properties list from scss.xml and with properties##CSS as the keyword name.

Diff Detail

Repository
R216 Syntax Highlighting
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
jpoelen created this revision.Aug 5 2018, 5:10 PM
Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptAug 5 2018, 5:10 PM
Restricted Application added subscribers: kde-frameworks-devel, kwrite-devel. · View Herald Transcript
jpoelen requested review of this revision.Aug 5 2018, 5:10 PM
jpoelen edited the summary of this revision. (Show Details)Aug 5 2018, 5:13 PM
jpoelen edited the test plan for this revision. (Show Details)

Good idea IMHO, and now that KTE is moving to the KSyntaxHighlighting engine we can actually look at feature extensions again :)

The engine and language changes look good to me, but I'll leave the final decision to Dominik/Christoph as I'm not sure about the implications for KTE right now before the KSyntaxHighlighting branch is integrated.

I think this needs to wait with merging until we finalized the usage of the syntax-highlighting framework in ktexteditor.
Volker, are you at the conference next week?

@jpoelen When adding such functionality, could you also also explain in the log message *why* this change is required or useful? I think we should have real-world use cases before integrating this. But in general, looks good.

Volker, are you at the conference next week?

Sure, I'll be there for the entire week.

Great ;) I assume we can handle the missing things during that week for a real switch.

ATM the basics already work, there are just some missing extras.
(and the color configuration is a mess)

If you have some time to take a look at the syntax-highlighting branch, any feedback is welcome, perhaps I already misuse the API.

jpoelen edited the summary of this revision. (Show Details)Aug 6 2018, 10:12 PM

I just thought of Definition::keywordLists() which does not list the keywords used by this new syntax.

  • Should we add the lists used as and when parsing?
  • Move the functionality on the tag <list>? (<list name="abc" include="name##language"/>, <list name="abc"> <include>name##language</include> <item>.... </list>)
  • Other?

I would like a

<list name="abc"> <include_list>name##language</include_list> <item>.... </list>

approach more, as then one could even nicely extend the included lists and as one is forced to have some list around it,
they will show up in the kewordLists.... automatically.

We make nice progress with integrating the syntax framework into ktexteditor.
I think after akademy we can merge this in a finalzed form ;=)

For the keywords stuff, one must keep in mind I altered the implementation of keyword lists a lot in:

https://phabricator.kde.org/D15217

jpoelen updated this revision to Diff 41298.Sep 9 2018, 9:23 PM

include rule in keyword list to import those from another file

The <include>properties##CSS</include> variant is really just like what I wanted, too!
Nice.
Only nitpick API wise: I would not expose the difference between includedDefinitions and includedKeywordDefinitions, I would just like to have all included definitions in includedDefinitions (and the docs altered that this covers all kind of includes)
Beside making the API more complex I see no benefit in differentiating between that.

cullmann requested changes to this revision.Sep 10 2018, 10:11 AM

Please unite the includedDefinitions()
Beside that, I think it would make sense to have the resolveIncludeKeywords() part inside the KeywordList class, then we don't need that many friends.

This revision now requires changes to proceed.Sep 10 2018, 10:11 AM

If you have no time for the wanted refactoring, I can do that this weekend and commit this, the syntax highlighting format changes are good as is ;=)

Sorry to answer so late, I had trouble logging in.

I think I misinterpreted the use of includedDefinitions(). For me, this function lists the languages ​included by the rules and which influence the colors. While the inclusion of a keyword list has no influence here. In the case of SASS, there is no IncludeRules to CSS because small syntactic details do not allow it, it would not be logical to have CSS in the list of includedDefinitions().

Basically, I added includedKeywordDefinitions() because of a Format test that did not work. But it's true that putting it in the public interface is not a good idea.

This was also my interpretation: only include the Definitions that use itemDatas / colors.

What we currently do: we already support loading the language metadata without loading the rest. Then, we support loading the full XML file.

What we could do: add the ability to load the keywords only. This way, all unit tests remain the same, and all behavior as well. In fact, that would be my preferred solution...

dhaumann added a comment.EditedSep 16 2018, 8:43 AM

To be clear, currently we have two states:

  • load metadata only
  • lazy load full file

Instead, we could have three states:

  • load metadata only
  • lazy load keyword lists (in addition to metadata)
  • lazy load full file

This is my preferred solution. ;)

resolveIncludeKeywords() would then use loadKeywords() instead of load() ? This should be enough.

Yes, and calling loadKeywords would only do work the first time.

jpoelen updated this revision to Diff 41872.Sep 17 2018, 11:31 PM
  • restore operators list
  • lazy load keyword lists

(I inadvertently edit the operators list in the previous commit)

Finally, I added a parameter to load() and loadHighlighting() rather than a new function because there was a lot of code duplication.

I don't know if the added complexity for loading only keywords is needed, but in any case one needs some recursive keyword resolving with a cycle guard.
e.g. if you use a list that again includes another one.

@cullmann If the other solution is easier to maintain then I am fine with this. Still, I would like to avoid that includedDefinitions() for sass returns css just because of keywords. In fact, that was my main motivation for this solution.

@cullmann Can you make a decision? I trust it will be a good one ;)

I can live with the second kind of loading state, but in any case one needs to fix the recursion detection and handling of nested included keyword lists.

e.g. a includes b includes c

ATM I think "something" will happen

jpoelen updated this revision to Diff 42628.Sep 30 2018, 10:02 PM
  • fix the recursion detection and handling of nested included keyword lists

Hi, I still need to test this.
For the enum, I have no issue with scoped enums, but I think something like

DefinitionData::LoadOptions::OnlyKeywords

would be better than a enum that just encapsules a bool.
Actually one could create a proper Q_FLAGS to allow combinations, if we later want to extend that.

Just played a bit with it.
Seems to work reasonable well :)
As the only visible change for the outside is the <include> XML addition to the keyword list, I think we might even just improve the internal API later.
Nothing of it is exposed at all.

For the unit test: is there a reason to write the syntax file during testing and not just place it in the test directory as file? I think having it just there as plain file makes maintaining the stuff nicer.

Volker, have you any objections to have this feature?

Volker, have you any objections to have this feature?

I'm all for it, seeing how much duplication it removes :)

cullmann accepted this revision.Oct 20 2018, 3:18 PM

Then I would say this should go in. We can improve the internals afterwards.
The XML files should get a version raise and kateversion raise, I assume, before commiting.

This revision is now accepted and ready to land.Oct 20 2018, 3:18 PM
This revision was automatically updated to reflect the committed changes.

I merged that now.

I bumped the version + required kate version

Git commit f2c29ec618da08ebe9d17ff739e8b12bf3c33fff by Christoph Cullmann.
Committed on 27/10/2018 at 15:14.
Pushed by cullmann into branch 'master'.

inc version + fixup required kate version to current framework version

M +1 -1 data/syntax/scss.xml

https://commits.kde.org/syntax-highlighting/f2c29ec618da08ebe9d17ff739e8b12bf3c33fff

diff --git a/data/syntax/scss.xml b/data/syntax/scss.xml
index 1f472aa..2968868 100644

  • a/data/syntax/scss.xml

+++ b/data/syntax/scss.xml
@@ -34,7 +34,7 @@ Changelog:

-->

-<language name="SCSS" version="7" kateversion="5.0" section="Markup" extensions="*.scss" indenter="cstyle" mimetype="text/css" author="Wilbert Berendsen (wilbert@kde.nl)" license="LGPL" priority="10">
+<language name="SCSS" version="8" kateversion="5.52" section="Markup" extensions="*.scss" indenter="cstyle" mimetype="text/css" author="Wilbert Berendsen (wilbert@kde.nl)" license="LGPL" priority="10">

What just came to my mind is that this patch also has an issue: If you include a keyword list of a.xml in b.xml, and a.xml uses different word delimiters for the keyword list, then the keywords in b.xml will not be highlighted/detected correctly.

Did we think about this when approving this change? What solutions do we have for this? The least we can do is to add this i) to the API documentation, and ii) to the highlighting section in the docbook.

That said, the correct fix would be to simply refer to the keyword list of the respective Definition, and if the rule is applied, we should check for the word delimiters of the referenced keyword list...

Any other thoughts?

I don't think using the original word delimiters is a good thing. Let's imagine that c.xml lists the functions of the libc and a language that allows them to be used as ffi.C.printf (luajit for the curious). The separators of the 2 languages are not the same and puts'plop' works for one, but not for the other. If we rely on the C delimiters, it will not work anymore. I think we can find other examples and perhaps counter-examples.

Currently, if we want to keep the word delimiters, we can always create a context in the referenced file (or another syntax file) and make an IncludeRule. It's ugly, the current checker complains if the context is not used, but I don't think that happens in nature. From what I see, the languages that share word lists are either syntactically similar or radically different.

I think the current way we handle that is OK.
If people start to include keywords between "incompatible" languages, any chosen direction is not really desirable.
If somebody has time, for sure it makes sense to document the current status quo.