diff --git a/doc/katepart/development.docbook b/doc/katepart/development.docbook index 95a826721..62fe1d235 100644 --- a/doc/katepart/development.docbook +++ b/doc/katepart/development.docbook @@ -1,3028 +1,3034 @@ &TC.Hollingsworth; &TC.Hollingsworth.mail; Extending &katepart; Introduction Like any advanced text editor component, &katepart; offers a variety of ways to extend its functionality. You can write simple scripts to add functionality with JavaScript. Finally, once you have extended &katepart;, you are welcome to join us and share your enhancements with the world! Working with Syntax Highlighting Overview Syntax Highlighting is what makes the editor automatically display text in different styles/colors, depending on the function of the string in relation to the purpose of the file. In program source code for example, control statements may be rendered bold, while data types and comments get different colors from the rest of the text. This greatly enhances the readability of the text, and thus helps the author to be more efficient and productive. A C++ function, rendered with syntax highlighting. A C++ function, rendered with syntax highlighting. The same C++ function, without highlighting. The same C++ function, without highlighting. Of the two examples, which is easiest to read? &kappname; comes with a flexible, configurable and capable system for doing syntax highlighting, and the standard distribution provides definitions for a wide range of programming, scripting and markup languages and other text file formats. In addition you can provide your own definitions in simple &XML; files. &kappname; will automatically detect the right syntax rules when you open a file, based on the &MIME; Type of the file, determined by its extension, or, if it has none, the contents. Should you experience a bad choice, you can manually set the syntax to use from the ToolsHighlighting menu. The styles and colors used by each syntax highlight definition can be configured using the Highlighting Text Styles tab of the Config Dialog, while the &MIME; Types and file extensions it should be used for are handled by the Modes & Filetypes tab. Syntax highlighting is there to enhance the readability of correct text, but you cannot trust it to validate your text. Marking text for syntax is difficult depending on the format you are using, and in some cases the authors of the syntax rules will be proud if 98% of text gets correctly rendered, though most often you need a rare style to see the incorrect 2%. The &kappname; Syntax Highlight System This section will discuss the &kappname; syntax highlighting mechanism in more detail. It is for you if you want to know about it, or if you want to change or create syntax definitions. How it Works Whenever you open a file, one of the first things the &kappname; editor does is detect which syntax definition to use for the file. While reading the text of the file, and while you type away in it, the syntax highlighting system will analyze the text using the rules defined by the syntax definition and mark in it where different contexts and styles begin and end. When you type in the document, the new text is analyzed and marked on the fly, so that if you delete a character that is marked as the beginning or end of a context, the style of surrounding text changes accordingly. The syntax definitions used by the &kappname; Syntax Highlighting System are &XML; files, containing Rules for detecting the role of text, organized into context blocks Keyword lists Style Item definitions When analyzing the text, the detection rules are evaluated in the order in which they are defined, and if the beginning of the current string matches a rule, the related context is used. The start point in the text is moved to the final point at which that rule matched and a new loop of the rules begins, starting in the context set by the matched rule. Rules The detection rules are the heart of the highlighting detection system. A rule is a string, character or regular expression against which to match the text being analyzed. It contains information about which style to use for the matching part of the text. It may switch the working context of the system either to an explicitly mentioned context or to the previous context used by the text. Rules are organized in context groups. A context group is used for main text concepts within the format, for example quoted text strings or comment blocks in program source code. This ensures that the highlighting system does not need to loop through all rules when it is not necessary, and that some character sequences in the text can be treated differently depending on the current context. Contexts may be generated dynamically to allow the usage of instance specific data in rules. Context Styles and Keywords In some programming languages, integer numbers are treated differently from floating point ones by the compiler (the program that converts the source code to a binary executable), and there may be characters having a special meaning within a quoted string. In such cases, it makes sense to render them differently from the surroundings so that they are easy to identify while reading the text. So even if they do not represent special contexts, they may be seen as such by the syntax highlighting system, so that they can be marked for different rendering. A syntax definition may contain as many styles as required to cover the concepts of the format it is used for. In many formats, there are lists of words that represent a specific concept. For example, in programming languages, control statements are one concept, data type names another, and built in functions of the language a third. The &kappname; Syntax Highlighting System can use such lists to detect and mark words in the text to emphasize concepts of the text formats. Default Styles If you open a C++ source file, a &Java; source file and an HTML document in &kappname;, you will see that even though the formats are different, and thus different words are chosen for special treatment, the colors used are the same. This is because &kappname; has a predefined list of Default Styles which are employed by the individual syntax definitions. This makes it easy to recognize similar concepts in different text formats. For example, comments are present in almost any programming, scripting or markup language, and when they are rendered using the same style in all languages, you do not have to stop and think to identify them within the text. All styles in a syntax definition use one of the default styles. A few syntax definitions use more styles than there are defaults, so if you use a format often, it may be worth launching the configuration dialog to see if some concepts use the same style. For example, there is only one default style for strings, but as the Perl programming language operates with two types of strings, you can enhance the highlighting by configuring those to be slightly different. All available default styles will be explained later. The Highlight Definition &XML; Format Overview &kappname; uses the Syntax-Highlighting framework from &kde-frameworks;. The default highlighting xml files shipped with &kappname; are compiled into the Syntax-Highlighting library by default. This section is an overview of the Highlight Definition &XML; format. Based on a small example it will describe the main components and their meaning and usage. The next section will go into detail with the highlight detection rules. The formal definition, also known as the XSD you find in Syntax Highlighting repository in the file language.xsd Custom .xml highlight definition files are located in org.kde.syntax-highlighting/syntax/ in your user folder found with qtpaths which usually is -$HOME/.local/share +$HOME/.local/share/. On &Windows; these files are located %USERPROFILE%/AppData/Local/org.kde.syntax-highlighting/syntax. %USERPROFILE% usually expands to C:\\Users\\user. +For Kate's Flatpak package, +these files are located $HOME/.var/app/org.kde.kate/data/org.kde.syntax-highlighting/syntax/ +and for Kate's Snap package, +it is $HOME/snap/kate/current/.local/share/org.kde.syntax-highlighting/syntax/. + + If multiple files exist for the same language, the file with the highest version attribute in the language element will be loaded. Main sections of &kappname; Highlight Definition files A highlighting file contains a header that sets the XML version: <?xml version="1.0" encoding="UTF-8"?> The root of the definition file is the element language. Available attributes are: Required attributes: name sets the name of the language. It appears in the menus and dialogs afterwards. section specifies the category. extensions defines file extensions, such as "*.cpp;*.h" version specifies the current revision of the definition file in terms of an integer number. Whenever you change a highlighting definition file, make sure to increase this number. kateversion specifies the latest supported &kappname; version. Optional attributes: mimetype associates files &MIME; type. casesensitive defines, whether the keywords are case sensitive or not. priority is necessary if another highlight definition file uses the same extensions. The higher priority will win. author contains the name of the author and his email-address. license contains the license, usually the MIT license for new syntax-highlighting files. style contains the provided language and is used by the indenters for the attribute required-syntax-style. indenter defines which indenter will be used by default. Available indenters are: ada, normal, cstyle, cmake, haskell, latex, lilypond, lisp, lua, pascal, python, replicode, ruby and xml. hidden defines whether the name should appear in &kappname;'s menus. So the next line may look like this: <language name="C++" version="1" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" /> Next comes the highlighting element, which contains the optional element list and the required elements contexts and itemDatas. list elements contain a list of keywords. In this case the keywords are class and const. You can add as many lists as you need. Since &kde-frameworks; 5.53, a list can include keywords from another list or language/file, using the include element. ## is used to separate the list name and the language definition name, in the same way as in the IncludeRules rule. This is useful to avoid duplicating keyword lists, if you need to include the keywords of another language/file. For example, the othername list contains the str keyword and all the keywords of the types list, which belongs to the ISO C++ language. The contexts element contains all contexts. The first context is by default the start of the highlighting. There are two rules in the context Normal Text, which match the list of keywords with the name somename and a rule that detects a quote and switches the context to string. To learn more about rules read the next chapter. The third part is the itemDatas element. It contains all color and font styles needed by the contexts and rules. In this example, the itemData Normal Text, String and Keyword are used. <highlighting> <list name="somename"> <item>class</item> <item>const</item> </list> <list name="othername"> <item>str</item> <include>types##ISO C++</include> </list> <contexts> <context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" > <keyword attribute="Keyword" context="#stay" String="somename" /> <keyword attribute="Keyword" context="#stay" String="othername" /> <DetectChar attribute="String" context="string" char="&quot;" /> </context> <context attribute="String" lineEndContext="#stay" name="string" > <DetectChar attribute="String" context="#pop" char="&quot;" /> </context> </contexts> <itemDatas> <itemData name="Normal Text" defStyleNum="dsNormal" /> <itemData name="Keyword" defStyleNum="dsKeyword" /> <itemData name="String" defStyleNum="dsString" /> </itemDatas> </highlighting> The last part of a highlight definition is the optional general section. It may contain information about keywords, code folding, comments, indentation, empty lines and spell checking. The comment section defines with what string a single line comment is introduced. You also can define a multiline comment using multiLine with the additional attribute end. This is used if the user presses the corresponding shortcut for comment/uncomment. The keywords section defines whether keyword lists are case sensitive or not. Other attributes will be explained later. The other sections, folding, emptyLines and spellchecking, are usually not necessary and are explained later. <general> <comments> <comment name="singleLine" start="#"/> </comments> <keywords casesensitive="1"/> <folding indentationsensitive="0"/> <emptyLines> <emptyLine regexpr="\s+"/> <emptyLine regexpr="\s*#.*"/> </emptyLines> <spellchecking> <encoding char="á" string="\'a"/> <encoding char="à" string="\`a"/> </spellchecking> </general> </language> The Sections in Detail This part will describe all available attributes for contexts, itemDatas, keywords, comments, code folding and indentation. The element context belongs in the group contexts. A context itself defines context specific rules such as what should happen if the highlight system reaches the end of a line. Available attributes are: name states the context name. Rules will use this name to specify the context to switch to if the rule matches. lineEndContext defines the context the highlight system switches to if it reaches the end of a line. This may either be a name of another context, #stay to not switch the context (⪚. do nothing) or #pop which will cause it to leave this context. It is possible to use for example #pop#pop#pop to pop three times, or even #pop#pop!OtherContext to pop two times and switch to the context named OtherContext. lineEmptyContext defines the context if an empty line is encountered. Default: #stay. fallthrough defines if the highlight system switches to the context specified in fallthroughContext if no rule matches. Default: false. fallthroughContext specifies the next context if no rule matches. noIndentationBasedFolding disables indentation-based folding in the context. If indentation-based folding is not activated, this attribute is useless. This is defined in the element folding of the group general. Default: false. The element itemData is in the group itemDatas. It defines the font style and colors. So it is possible to define your own styles and colors. However, we recommend you stick to the default styles if possible so that the user will always see the same colors used in different languages. Though, sometimes there is no other way and it is necessary to change color and font attributes. The attributes name and defStyleNum are required, the others are optional. Available attributes are: name sets the name of the itemData. Contexts and rules will use this name in their attribute attribute to reference an itemData. defStyleNum defines which default style to use. Available default styles are explained in detail later. color defines a color. Valid formats are '#rrggbb' or '#rgb'. selColor defines the selection color. italic if true, the text will be italic. bold if true, the text will be bold. underline if true, the text will be underlined. strikeout if true, the text will be struck out. spellChecking if true, the text will be spellchecked. The element keywords in the group general defines keyword properties. Available attributes are: casesensitive may be true or false. If true, all keywords are matched case sensitively. weakDeliminator is a list of characters that do not act as word delimiters. For example, the dot '.' is a word delimiter. Assume a keyword in a list contains a dot, it will only match if you specify the dot as a weak delimiter. additionalDeliminator defines additional delimiters. wordWrapDeliminator defines characters after which a line wrap may occur. Default delimiters and word wrap delimiters are the characters .():!+,-<=>%&*/;?[]^{|}~\, space (' ') and tabulator ('\t'). The element comment in the group comments defines comment properties which are used for ToolsComment and ToolsUncomment. Available attributes are: name is either singleLine or multiLine. If you choose multiLine the attributes end and region are required. start defines the string used to start a comment. In C++ this would be "/*". end defines the string used to close a comment. In C++ this would be "*/". region should be the name of the foldable multiline comment. Assume you have beginRegion="Comment" ... endRegion="Comment" in your rules, you should use region="Comment". This way uncomment works even if you do not select all the text of the multiline comment. The cursor only must be in the multiline comment. The element folding in the group general defines code folding properties. Available attributes are: indentationsensitive if true, the code folding markers will be added indentation based, as in the scripting language Python. Usually you do not need to set it, as it defaults to false. The element emptyLine in the group emptyLines defines which lines should be treated as empty lines. This allows modifying the behavior of the lineEmptyContext attribute in the elements context. Available attributes are: regexpr defines a regular expression that will be treated as an empty line. By default, empty lines do not contain any characters, therefore, this adds additional empty lines, for example, if you want lines with spaces to also be considered empty lines. However, in most syntax definitions you do not need to set this attribute. The element encoding in the group spellchecking defines a character encoding for spell checking. Available attributes: char is a encoded character. string is a sequence of characters that will be encoded as the character char in the spell checking. For example, in the language LaTeX, the string \"{A} represents the character Ä. Available Default Styles Default Styles were already explained, as a short summary: Default styles are predefined font and color styles. General default styles: dsNormal, when no special highlighting is required. dsKeyword, built-in language keywords. dsFunction, function calls and definitions. dsVariable, if applicable: variable names (e.g. $someVar in PHP/Perl). dsControlFlow, control flow keywords like if, else, switch, break, return, yield, ... dsOperator, operators like + - * / :: < > dsBuiltIn, built-in functions, classes, and objects. dsExtension, common extensions, such as Qt classes and functions/macros in C++ and Python. dsPreprocessor, preprocessor statements or macro definitions. dsAttribute, annotations such as @override and __declspec(...). String-related default styles: dsChar, single characters, such as 'x'. dsSpecialChar, chars with special meaning in strings such as escapes, substitutions, or regex operators. dsString, strings like "hello world". dsVerbatimString, verbatim or raw strings like 'raw \backlash' in Perl, CoffeeScript, and shells, as well as r'\raw' in Python. dsSpecialString, SQL, regexes, HERE docs, LaTeX math mode, ... dsImport, import, include, require of modules. Number-related default styles: dsDataType, built-in data types like int, void, u64. dsDecVal, decimal values. dsBaseN, values with a base other than 10. dsFloat, floating point values. dsConstant, built-in and user defined constants like PI. Comment and documentation-related default styles: dsComment, comments. dsDocumentation, /** Documentation comments */ or """docstrings""". dsAnnotation, documentation commands like @param, @brief. dsCommentVar, the variable names used in above commands, like "foobar" in @param foobar. dsRegionMarker, region markers like //BEGIN, //END in comments. Other default styles: dsInformation, notes and tips like @note in doxygen. dsWarning, warnings like @warning in doxygen. dsAlert, special words like TODO, FIXME, XXXX. dsError, error highlighting and wrong syntax. dsOthers, when nothing else fits. Highlight Detection Rules This section describes the syntax detection rules. Each rule can match zero or more characters at the beginning of the string they are tested against. If the rule matches, the matching characters are assigned the style or attribute defined by the rule, and a rule may ask that the current context is switched. A rule looks like this: <RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] /> The attribute identifies the style to use for matched characters by name, and the context identifies the context to use from here. The context can be identified by: An identifier, which is the name of the other context. An order telling the engine to stay in the current context (#stay), or to pop back to a previous context used in the string (#pop). To go back more steps, the #pop keyword can be repeated: #pop#pop#pop An order followed by an exclamation mark (!) and an identifier, which will make the engine first follow the order and then switch to the other context, e.g. #pop#pop!OtherContext. Rule specific attributes varies and are described in the following sections. Common attributes All rules have the following attributes in common and are available whenever (common attributes) appears. attribute and context are required attributes, all others are optional. attribute: An attribute maps to a defined itemData. context: Specify the context to which the highlighting system switches if the rule matches. beginRegion: Start a code folding block. Default: unset. endRegion: Close a code folding block. Default: unset. lookAhead: If true, the highlighting system will not process the matches length. Default: false. firstNonSpace: Match only, if the string is the first non-whitespace in the line. Default: false. column: Match only, if the column matches. Default: unset. Dynamic rules Some rules allow the optional attribute dynamic of type boolean that defaults to false. If dynamic is true, a rule can use placeholders representing the text matched by a regular expression rule that switched to the current context in its string or char attributes. In a string, the placeholder %N (where N is a number) will be replaced with the corresponding capture N from the calling regular expression, starting from 1. In a char the placeholder must be a number N and it will be replaced with the first character of the corresponding capture N from the calling regular expression. Whenever a rule allows this attribute it will contain a (dynamic). dynamic: may be (true|false). How does it work: In the regular expressions of the RegExpr rules, all text within simple curved brackets (PATTERN) is captured and remembered. These captures can be used in the context to which it is switched, in the rules with the attribute dynamic true, by %N (in String) or N (in char). It is important to mention that a text captured in a RegExpr rule is only stored for the switched context, specified in its context attribute. If the captures will not be used, both by dynamic rules and in the same regular expression, non-capturing groups should be used: (?:PATTERN) The lookahead or lookbehind groups such as (?=PATTERN), (?!PATTERN) or (?<=PATTERN) are not captured. See Regular Expressions for more information. The capture groups can be used within the same regular expression, using \N instead of %N respectively. For more information, see Capturing matching text (back references) in Regular Expressions. Example 1: In this simple example, the text matched by the regular expression =* is captured and inserted into %1 in the dynamic rule. This allows the comment to end with the same amount of = as at the beginning. This matches text like: [[ comment ]], [=[ comment ]=] or [=====[ comment ]=====]. In addition, the captures are available only in the switched context Multi-line Comment. <context name="Normal" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="Multi-line Comment" attribute="Comment" String="\[(=*)\[" beginRegion="RegionComment"/> </context> <context name="Multi-line Comment" attribute="Comment" lineEndContext="#stay"> <StringDetect context="#pop" attribute="Comment" String="]%1]" dynamic="true" endRegion="RegionComment"/> </context> Example 2: In the dynamic rule, %1 corresponds to the capture that matches #+, and %2 to &quot;+. This matches text as: #label""""inside the context""""#. These captures will not be available in other contexts, such as OtherContext, FindEscapes or SomeContext. <context name="SomeContext" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="#pop!NamedString" attribute="String" String="(#+)(?:[\w-]|[^[:ascii:]])(&quot;+)"/> </context> <context name="NamedString" attribute="String" lineEndContext="#stay"> <RegExpr context="#pop!OtherContext" attribute="String" String="%2(?:%1)?" dynamic="true"/> <DetectChar context="FindEscapes" attribute="Escape" char="\"/> </context> Example 3: This matches text like: Class::function<T>( ... ). <context name="Normal" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="FunctionName" String="\b([a-zA-Z_][\w-]*)(::)([a-zA-Z_][\w-]*)(?:&lt;[\w\-\s]*&gt;)?(\()" lookAhead="true"/> </context> <context name="FunctionName" attribute="Normal Text" lineEndContext="#pop"> <StringDetect context="#stay" attribute="Class" String="%1" dynamic="true"/> <StringDetect context="#stay" attribute="Operator" String="%2" dynamic="true"/> <StringDetect context="#stay" attribute="Function" String="%3" dynamic="true"/> <DetectChar context="#pop" attribute="Normal Text" char="4" dynamic="true"/> </context> The Rules in Detail DetectChar Detect a single specific character. Commonly used for example to find the ends of quoted strings. <DetectChar char="(character)" (common attributes) (dynamic) /> The char attribute defines the character to match. Detect2Chars Detect two specific characters in a defined order. <Detect2Chars char="(character)" char1="(character)" (common attributes) /> The char attribute defines the first character to match, char1 the second. AnyChar Detect one character of a set of specified characters. <AnyChar String="(string)" (common attributes) /> The String attribute defines the set of characters. StringDetect Detect an exact string. <StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) /> The String attribute defines the string to match. The insensitive attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used. WordDetect Detect an exact string but additionally require word boundaries such as a dot '.' or a whitespace on the beginning and the end of the word. Think of \b<string>\b in terms of a regular expression, but it is faster than the rule RegExpr. <WordDetect String="(string)" [insensitive="true|false"] (common attributes) /> The String attribute defines the string to match. The insensitive attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used. Since: Kate 3.5 (KDE 4.5) RegExpr Matches against a regular expression. <RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) /> The String attribute defines the regular expression. insensitive defaults to false and is passed to the regular expression engine. minimal defaults to false and is passed to the regular expression engine. Because the rules are always matched against the beginning of the current string, a regular expression starting with a caret (^) indicates that the rule should only be matched against the start of a line. See Regular Expressions for more information on those. keyword Detect a keyword from a specified list. <keyword String="(list name)" (common attributes) /> The String attribute identifies the keyword list by name. A list with that name must exist. The highlighting system processes keyword rules in a very optimized way. This makes it an absolute necessity that any keywords to be matched need to be surrounded by defined delimiters, either implied (the default delimiters), or explicitly specified within the additionalDeliminator property of the keywords tag. If a keyword to be matched shall contain a delimiter character, this respective character must be added to the weakDeliminator property of the keywords tag. This character will then loose its delimiter property in all keyword rules. Int Detect an integer number (as the regular expression: \b[0-9]+). <Int (common attributes) /> This rule has no specific attributes. Float Detect a floating point number (as the regular expression: (\b[0-9]+\.[0-9]*|\.[0-9]+)([eE][-+]?[0-9]+)?). <Float (common attributes) /> This rule has no specific attributes. HlCOct Detect an octal point number representation (as the regular expression: \b0[0-7]+). <HlCOct (common attributes) /> This rule has no specific attributes. HlCHex Detect a hexadecimal number representation (as a regular expression: \b0[xX][0-9a-fA-F]+). <HlCHex (common attributes) /> This rule has no specific attributes. HlCStringChar Detect an escaped character. <HlCStringChar (common attributes) /> This rule has no specific attributes. It matches literal representations of characters commonly used in program code, for example \n (newline) or \t (TAB). The following characters will match if they follow a backslash (\): abefnrtv"'?\. Additionally, escaped hexadecimal numbers such as for example \xff and escaped octal numbers, for example \033 will match. HlCChar Detect an C character. <HlCChar (common attributes) /> This rule has no specific attributes. It matches C characters enclosed in a tick (Example: 'c'). The ticks may be a simple character or an escaped character. See HlCStringChar for matched escaped character sequences. RangeDetect Detect a string with defined start and end characters. <RangeDetect char="(character)" char1="(character)" (common attributes) /> char defines the character starting the range, char1 the character ending the range. Useful to detect for example small quoted strings and the like, but note that since the highlighting engine works on one line at a time, this will not find strings spanning over a line break. LineContinue Matches a specified char at the end of a line. <LineContinue (common attributes) [char="\"] /> char optional character to match, default is backslash ('\'). New since KDE 4.13. This rule is useful for switching context at end of line. This is needed for example in C/C++ to continue macros or strings. IncludeRules Include rules from another context or language/file. <IncludeRules context="contextlink" [includeAttrib="true|false"] /> The context attribute defines which context to include. If it is a simple string it includes all defined rules into the current context, example: <IncludeRules context="anotherContext" /> If the string contains a ## the highlight system will look for a context from another language definition with the given name, for example <IncludeRules context="String##C++" /> would include the context String from the C++ highlighting definition. If includeAttrib attribute is true, change the destination attribute to the one of the source. This is required to make, for example, commenting work, if text matched by the included context is a different highlight from the host context. DetectSpaces Detect whitespaces. <DetectSpaces (common attributes) /> This rule has no specific attributes. Use this rule if you know that there can be several whitespaces ahead, for example in the beginning of indented lines. This rule will skip all whitespace at once, instead of testing multiple rules and skipping one at a time due to no match. DetectIdentifier Detect identifier strings (as the regular expression: [a-zA-Z_][a-zA-Z0-9_]*). <DetectIdentifier (common attributes) /> This rule has no specific attributes. Use this rule to skip a string of word characters at once, rather than testing with multiple rules and skipping one at a time due to no match. Tips & Tricks Once you have understood how the context switching works it will be easy to write highlight definitions. Though you should carefully check what rule you choose in what situation. Regular expressions are very mighty, but they are slow compared to the other rules. So you may consider the following tips. If you only match two characters use Detect2Chars instead of StringDetect. The same applies to DetectChar. Regular expressions are easy to use but often there is another much faster way to achieve the same result. Consider you only want to match the character '#' if it is the first character in the line. A regular expression based solution would look like this: <RegExpr attribute="Macro" context="macro" String="^\s*#" /> You can achieve the same much faster in using: <DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" /> If you want to match the regular expression '^#' you can still use DetectChar with the attribute column="0". The attribute column counts characters, so a tabulator is only one character. In RegExpr rules, use the attribute column="0" if the pattern ^PATTERN will be used to match text at the beginning of a line. This improves performance, as it will avoid looking for matches in the rest of the columns. In regular expressions, use non-capturing groups (?:PATTERN) instead of capturing groups (PATTERN), if the captures will not be used in the same regular expression or in dynamic rules. This avoids storing captures unnecessarily. You can switch contexts without processing characters. Assume that you want to switch context when you meet the string */, but need to process that string in the next context. The below rule will match, and the lookAhead attribute will cause the highlighter to keep the matched string for the next context. <Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" /> Use DetectSpaces if you know that many whitespaces occur. Use DetectIdentifier instead of the regular expression '[a-zA-Z_]\w*'. Use default styles whenever you can. This way the user will find a familiar environment. Look into other XML-files to see how other people implement tricky rules. You can validate every XML file by using the command validatehl.sh language.xsd mySyntax.xml. The files validatehl.sh and language.xsd are available in Syntax Highlighting repository. If you repeat complex regular expression very often you can use ENTITIES. Example: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE language SYSTEM "language.dtd" [ <!ENTITY myref "[A-Za-z_:][\w.:_-]*"> ]> Now you can use &myref; instead of the regular expression. Scripting with JavaScript The &kappname; editor component is easily extensible by writing scripts. The scripting language is ECMAScript (widely known as JavaScript). &kappname; supports two kinds of scripts: indentation and command line scripts. Indentation Scripts Indentation scripts - also referred as indenters - automatically indent the source code while typing text. As an example, after hitting the return key the indentation level often increases. The following sections describe step by step how to create the skeleton for a simple indenter. As a first step, create a new *.js file called ⪚ javascript.js in the local home folder $XDG_DATA_HOME/katepart5/script/indentation. Therein, the environment variable XDG_DATA_HOME typically expands to either ~/.local or ~/.local/share. On &Windows; these files are located in %USERPROFILE%\AppData\Local\katepart5\indentation. %USERPROFILE% usually expands to C:\\Users\\user. The Indentation Script Header The header of the file javascript.js is embedded as JSON at the beginning of the document as follows: var katescript = { "name": "JavaScript", "author": "Example Name <example.name@some.address.org>", "license": "BSD License", "revision": 1, "kate-version": "5.1", "required-syntax-style": "javascript", "indent-languages": ["javascript"], "priority": 0, }; // kate-script-header, must be at the start of the file without comments Each entry is explained in detail now: name [required]: This is the indenter name that appears in the menu ToolsIndentation and in the configuration dialog. author [optional]: The author's name and contact information. license [optional]: Short form of the license, such as BSD License or LGPLv3. revision [required]: The revision of the script. This number should be increased whenever the script is modified. kate-version [required]: Minimum required &kappname; version. required-syntax-style [optional]: The required syntax style, which matches the specified style in syntax highlighting files. This is important for indenters that rely on specific highlight information in the document. If a required syntax style is specified, the indenter is available only when the appropriate highlighter is active. This prevents undefined behavior caused by using the indenter without the expected highlighting schema. For instance, the Ruby indenter makes use of this in the files ruby.js and ruby.xml. indent-languages [optional]: JSON array of syntax styles the indenter can indent correctly, ⪚: ["c++", "java"]. priority [optional]: If several indenters are suited for a certain highlighted file, the priority decides which indenter is chosen as default indenter. The Indenter Source Code Having specified the header this section explains how the indentation scripting itself works. The basic skeleton of the body looks like this: // required katepart js libraries, e.g. range.js if you use Range require ("range.js"); triggerCharacters = "{}/:;"; function indent(line, indentWidth, ch) { // called for each newline (ch == '\n') and all characters specified in // the global variable triggerCharacters. When calling ToolsAlign // the variable ch is empty, i.e. ch == ''. // // see also: Scripting API return -2; } The function indent() has three parameters: line: the line that has to be indented indentWidth: the indentation width in number of spaces ch: either a newline character (ch == '\n'), the trigger character specified in triggerCharacters or empty if the user invoked the action ToolsAlign. The return value of the indent() function specifies how the line will be indented. If the return value is a simple integer number, it is interpreted as follows: return value -2: do nothing return value -1: keep indentation (searches for previous non-blank line) return value 0: numbers >= 0 specify the indentation depth in spaces Alternatively, an array of two elements can be returned: return [ indent, align ]; In this case, the first element is the indentation depth as above with the same meaning of the special values. However, the second element is an absolute value representing a column for alignment. If this value is higher than the indent value, the difference represents a number of spaces to be added after the indentation of the first parameter. Otherwise, the second number is ignored. Using tabs and spaces for indentation is often referred to as mixed mode. Consider the following example: Assume using tabs to indent, and tab width is set to 4. Here, <tab> represents a tab and '.' a space: 1: <tab><tab>foobar("hello", 2: <tab><tab>......."world"); When indenting line 2, the indent() function returns [8, 15]. As result, two tabs are inserted to indent to column 8, and 7 spaces are added to align the second parameter under the first, so that it stays aligned if the file is viewed with a different tab width. A default &kde; installation ships &kappname; with several indenters. The corresponding JavaScript source code can be found in $XDG_DATA_DIRS/katepart5/script/indentation. On &Windows; these files are located in %USERPROFILE%\AppData\Local\katepart5\indentation. %USERPROFILE% usually expands to C:\\Users\\user. Developing an indenter requires reloading the scripts to see whether the changes behave appropriately. Instead of restarting the application, simply switch to the command line and invoke the command reload-scripts. If you develop useful scripts please consider contributing to the &kappname; Project by contacting the mailing list. Command Line Scripts As it is hard to satisfy everyone's needs, &kappname; supports little helper tools for quick text manipulation through the built-in command line. For instance, the command sort is implemented as a script. This section explains how to create *.js files to extend &kappname; with arbitrary helper scripts. Command line scripts are located in the same folder as indentation scripts. So as a first step, create a new *.js file called myutils.js in the local home folder $XDG_DATA_HOME/katepart5/script/commands. Therein, the environment variable XDG_DATA_HOME typically expands to either ~/.local or ~/.local/share. On &Windows; these files are located in %USERPROFILE%\AppData\Local\katepart5\commands. %USERPROFILE% usually expands to C:\\Users\\user. The Command Line Script Header The header of each command line script is embedded in JSON at the beginning of the script as follows: var katescript = { "author": "Example Name <example.name@some.address.org>", "license": "LGPLv2+", "revision": 1, "kate-version": "5.1", "functions": ["sort", "moveLinesDown"], "actions": [ { "function": "sort", "name": "Sort Selected Text", "category": "Editing", "interactive": "false" }, { "function": "moveLinesDown", "name": "Move Lines Down", "category": "Editing", "shortcut": "Ctrl+Shift+Down", "interactive": "false" } ] }; // kate-script-header, must be at the start of the file without comments Each entry is explained in detail now: author [optional]: The author's name and contact information. license [optional]: Short form of the license, such as BSD License or LGPLv2. revision [required]: The revision of the script. This number should be increased whenever the script is modified. kate-version [required]: Minimum required &kappname; version. functions [required]: JSON array of commands in the script. actions [optional]: JSON Array of JSON objects that defines the actions that appear in the application menu. Detailed information is provided in the section Binding Shortcuts. Since the value of functions is a JSON array, a single script is able to contain an arbitrary number of command line commands. Each function is available through &kappname;'s built-in command line. The Script Source Code All functions specified in the header have to be implemented in the script. For instance, the script file from the example above needs to implement the two functions sort and moveLinesDown. All functions have the following syntax: // required katepart js libraries, e.g. range.js if you use Range require ("range.js"); function <name>(arg1, arg2, ...) { // ... implementation, see also: Scripting API } Arguments in the command line are passed to the function as arg1, arg2, etc. In order to provide documentation for each command, simply implement the 'help' function as follows: function help(cmd) { if (cmd == "sort") { return i18n("Sort the selected text."); } else if (cmd == "...") { // ... } } Executing help sort in the command line then calls this help function with the argument cmd set to the given command, &ie; cmd == "sort". &kappname; then presents the returned text as documentation to the user. Make sure to translate the strings. Developing a command line script requires reloading the scripts to see whether the changes behave appropriately. Instead of restarting the application, simply switch to the command line and invoke the command reload-scripts. Binding Shortcuts In order to make the scripts accessible in the application menu and assign shortcuts, the script needs to provide an appropriate script header. In the above example, both functions sort and moveLinesDown appear in the menu due to the following part in the script header: var katescript = { ... "actions": [ { "function": "sort", "name": "Sort Selected Text", "icon": "", "category": "Editing", "interactive": "false" }, { "function": "moveLinesDown", "name": "Move Lines Down", "icon": "", "category": "Editing", "shortcut": "Ctrl+Shift+Down", "interactive": "false" } ] }; The fields for one action are as follows: function [required]: The function that should appear in the menu Tools Scripts. name [required]: The text appears in the script menu. icon [optional]: The icon appears next to the text in the menu. All &kde; icon names can be used here. category [optional]: If a category is specified, the script appears in a submenu. shortcut [optional]: The shortcut given here is the default shortcut. Example: Ctrl+Alt+t. See the Qt documentation for further details. interactive [optional]: If the script needs user input in the command line, set this to true. If you develop useful scripts please consider contributing to the &kappname; Project by contacting the mailing list. Scripting API The scripting API presented here is available to all scripts, &ie; indentation scripts and command line commands. The Cursor and Range classes are provided by library files in $XDG_DATA_DIRS/katepart5/libraries. If you want to use them in your script, which needs to use some of the Document or View functions, please include the necessary library by using: // required katepart js libraries, e.g. range.js if you use Range require ("range.js"); To extend the standard scripting API with your own functions and prototypes simply create a new file in &kde;'s local configuration folder $XDG_DATA_HOME/katepart5/libraries and include it into your script using: require ("myscriptnamehere.js"); On &Windows; these files are located in %USERPROFILE%\AppData\Local\katepart5\libraries. %USERPROFILE% usually expands to C:\\Users\\user. To extend existing prototypes like Cursor or Range, the recommended way is to not modify the global *.js files. Instead, change the Cursor prototype in JavaScript after the cursor.js is included into your script via require. Cursors and Ranges As &kappname; is a text editor, all the scripting API is based on cursors and ranges whenever possible. A Cursor is a simple (line, column) tuple representing a text position in the document. A Range spans text from a starting cursor position to an ending cursor position. The API is explained in detail in the next sections. The Cursor Prototype Cursor(); Constructor. Returns a Cursor at position (0, 0). Example: var cursor = new Cursor(); Cursor(int line, int column); Constructor. Returns a Cursor at position (line, column). Example: var cursor = new Cursor(3, 42); Cursor(Cursor other); Copy constructor. Returns a copy of the cursor other. Example: var copy = new Cursor(other); Cursor Cursor.clone(); Returns a clone of the cursor. Example: var clone = cursor.clone(); Cursor.setPosition(int line, int column); Sets the cursor position to line and column. Since: &kde; 4.11 bool Cursor.isValid(); Check whether the cursor is valid. The cursor is invalid, if line and/or column are set to -1. Example: var valid = cursor.isValid(); Cursor Cursor.invalid(); Returns a new invalid cursor located at (-1, -1). Example: var invalidCursor = cursor.invalid(); int Cursor.compareTo(Cursor other); Compares this cursor to the cursor other. Returns -1, if this cursor is located before the cursor other, 0, if both cursors equal and +1, if this cursor is located after the cursor other. bool Cursor.equals(Cursor other); Returns true, if this cursor and the cursor other are equal, otherwise false. String Cursor.toString(); Returns the cursor as a string of the form Cursor(line, column). The Range Prototype Range(); Constructor. Calling new Range() returns a Range at (0, 0) - (0, 0). Range(Cursor start, Cursor end); Constructor. Calling new Range(start, end) returns the Range (start, end). Range(int startLine, int startColumn, int endLine, int endColumn); Constructor. Calling new Range(startLine, startColumn, endLine, endColumn) returns the Range from (startLine, startColumn) to (endLine, endColumn). Range(Range other); Copy constructor. Returns a copy of Range other. Range Range.clone(); Returns a clone of the range. Example: var clone = range.clone(); bool Range.isEmpty(); Returns true, if the start and end cursors are equal. Example: var empty = range.isEmpty(); Since: &kde; 4.11 bool Range.isValid(); Returns true, if both start and end cursor are valid, otherwise false. Example: var valid = range.isValid(); Range Range.invalid(); Returns the Range from (-1, -1) to (-1, -1). bool Range.contains(Cursor cursor); Returns true, if this range contains the cursor position, otherwise false. bool Range.contains(Range other); Returns true, if this range contains the Range other, otherwise false. bool Range.containsColumn(int column); Returns true, if column is in the half open interval [start.column, end.column), otherwise false. bool Range.containsLine(int line); Returns true, if line is in the half open interval [start.line, end.line), otherwise false. bool Range.overlaps(Range other); Returns true, if this range and the range other share a common region, otherwise false. bool Range.overlapsLine(int line); Returns true, if line is in the interval [start.line, end.line], otherwise false. bool Range.overlapsColumn(int column); Returns true, if column is in the interval [start.column, end.column], otherwise false. bool Range.onSingleLine(); Returns true, if the range starts and ends at the same line, &ie; if Range.start.line == Range.end.line. Since: &kde; 4.9 bool Range.equals(Range other); Returns true, if this range and the Range other are equal, otherwise false. String Range.toString(); Returns the range as a string of the form Range(Cursor(line, column), Cursor(line, column)). Global Functions This section lists all global functions. Reading & Including Files String read(String file); Will search the given file relative to the katepart/script/files directory and return its content as a string. void require(String file); Will search the given file relative to the katepart/script/libraries directory and evaluate it. require is internally guarded against multiple inclusions of the same file. Since: &kde; 4.10 Debugging void debug(String text); Prints text to stdout in the console launching the application. Translation In order to support full localization, there are several functions to translate strings in scripts, namely i18n, i18nc, i18np and i18ncp. These functions behave exactly like &kde;'s translation functions. The translation functions translate the wrapped strings through &kde;'s translation system to the language used in the application. Strings in scripts being developed in the official &kappname; sources are automatically extracted and translatable. In other words, as a &kappname; developer you do not have to bother with message extraction and translation. It should be noted though, that the translation only works inside the &kde; infrastructure, &ie;, new strings in 3rd-party scripts developed outside of &kde; are not translated. Therefore, please consider contributing your scripts to &kate; such that proper translation is possible. void i18n(String text, arg1, ...); Translates text into the language used by the application. The arguments arg1, ..., are optional and used to replace the placeholders %1, %2, etc. void i18nc(String context, String text, arg1, ...); Translates text into the language used by the application. Additionally, the string context is visible to translators so they can provide a better translation. The arguments arg1, ..., are optional and used to replace the placeholders %1, %2, etc. void i18np(String singular, String plural, int number, arg1, ...); Translates either singular or plural into the language used by the application, depending on the given number. The arguments arg1, ..., are optional and used to replace the placeholders %1, %2, etc. void i18ncp(String context, String singular, String plural, int number, arg1, ...); Translates either singular or plural into the language used by the application, depending on the given number. Additionally, the string context is visible to translators so they can provide a better translation. The arguments arg1, ..., are optional and used to replace the placeholders %1, %2, etc. The View API Whenever a script is being executed, there is a global variable view representing the current active editor view. The following is a list of all available View functions. Cursor view.cursorPosition() Returns the current cursor position in the view. void view.setCursorPosition(int line, int column); void view.setCursorPosition(Cursor cursor); Set the current cursor position to either (line, column) or to the given cursor. Cursor view.virtualCursorPosition(); Returns the virtual cursor position with each tab counting the corresponding amount of spaces depending on the current tab width. void view.setVirtualCursorPosition(int line, int column); void view.setVirtualCursorPosition(Cursor cursor); Set the current virtual cursor position to (line, column) or to the given cursor. String view.selectedText(); Returns the selected text. If no text is selected, the returned string is empty. bool view.hasSelection(); Returns true, if the view has selected text, otherwise false. Range view.selection(); Returns the selected text range. The returned range is invalid if there is no selected text. void view.setSelection(Range range); Set the selected text to the given range. void view.removeSelectedText(); Remove the selected text. If the view does not have any selected text, this does nothing. void view.selectAll(); Selects the entire text in the document. void view.clearSelection(); Clears the text selection without removing the text. object view.executeCommand(String command, String args, Range range); Executes the command line command command with the optional arguments args and the optional range. The returned object has a boolean property object.ok that indicates whether execution of the command was successful. In case of an error, the string object.status contains an error message. Since: &kde-frameworks; 5.50 The Document API Whenever a script is being executed, there is a global variable document representing the current active document. The following is a list of all available Document functions. String document.fileName(); Returns the document's filename or an empty string for unsaved text buffers. String document.url(); Returns the document's full url or an empty string for unsaved text buffers. String document.mimeType(); Returns the document's mime type or the mime type application/octet-stream if no appropriate mime type could be found. String document.encoding(); Returns the currently used encoding to save the file. String document.highlightingMode(); Returns the global highlighting mode used for the whole document. String document.highlightingModeAt(Cursor pos); Returns the highlighting mode used at the given position in the document. Array document.embeddedHighlightingModes(); Returns an array of highlighting modes embedded in this document. bool document.isModified(); Returns true, if the document has unsaved changes (modified), otherwise false. String document.text(); Returns the entire content of the document in a single text string. Newlines are marked with the newline character \n. String document.text(int fromLine, int fromColumn, int toLine, int toColumn); String document.text(Cursor from, Cursor to); String document.text(Range range); Returns the text in the given range. It is recommended to use the cursor and range based version for better readability of the source code. String document.line(int line); Returns the given text line as string. The string is empty if the requested line is out of range. String document.wordAt(int line, int column); String document.wordAt(Cursor cursor); Returns the word at the given cursor position. Range document.wordRangeAt(int line, int column); Range document.wordRangeAt(Cursor cursor); Return the range of the word at the given cursor position. The returned range is invalid (see Range.isValid()), if the text position is after the end of a line. If there is no word at the given cursor, an empty range is returned. Since: &kde; 4.9 String document.charAt(int line, int column); String document.charAt(Cursor cursor); Returns the character at the given cursor position. String document.firstChar(int line); Returns the first character in the given line that is not a whitespace. The first character is at column 0. If the line is empty or only contains whitespace characters, the returned string is empty. String document.lastChar(int line); Returns the last character in the given line that is not a whitespace. If the line is empty or only contains whitespace characters, the returned string is empty. bool document.isSpace(int line, int column); bool document.isSpace(Cursor cursor); Returns true, if the character at the given cursor position is a whitespace, otherwise false. bool document.matchesAt(int line, int column, String text); bool document.matchesAt(Cursor cursor, String text); Returns true, if the given text matches at the corresponding cursor position, otherwise false. bool document.startsWith(int line, String text, bool skipWhiteSpaces); Returns true, if the line starts with text, otherwise false. The argument skipWhiteSpaces controls whether leading whitespaces are ignored. bool document.endsWith(int line, String text, bool skipWhiteSpaces); Returns true, if the line ends with text, otherwise false. The argument skipWhiteSpaces controls whether trailing whitespaces are ignored. bool document.setText(String text); Sets the entire document text. bool document.clear(); Removes the entire text in the document. bool document.truncate(int line, int column); bool document.truncate(Cursor cursor); Truncate the given line at the given column or cursor position. Returns true on success, or false if the given line is not part of the document range. bool document.insertText(int line, int column, String text); bool document.insertText(Cursor cursor, String text); Inserts the text at the given cursor position. Returns true on success, or false, if the document is in read-only mode. bool document.removeText(int fromLine, int fromColumn, int toLine, int toColumn); bool document.removeText(Cursor from, Cursor to); bool document.removeText(Range range); Removes the text in the given range. Returns true on success, or false, if the document is in read-only mode. bool document.insertLine(int line, String text); Inserts text in the given line. Returns true on success, or false, if the document is in read-only mode or the line is not in the document range. bool document.removeLine(int line); Removes the given text line. Returns true on success, or false, if the document is in read-only mode or the line is not in the document range. bool document.wrapLine(int line, int column); bool document.wrapLine(Cursor cursor); Wraps the line at the given cursor position. Returns true on success, otherwise false, ⪚ if line < 0. Since: &kde; 4.9 void document.joinLines(int startLine, int endLine); Joins the lines from startLine to endLine. Two succeeding text lines are always separated with a single space. int document.lines(); Returns the number of lines in the document. bool document.isLineModified(int line); Returns true, if line currently contains unsaved data. Since: &kde; 5.0 bool document.isLineSaved(int line); Returns true, if line was changed, but the document was saved. Hence, the line currently does not contain any unsaved data. Since: &kde; 5.0 bool document.isLineTouched(int line); Returns true, if line currently contains unsaved data or was changed before. Since: &kde; 5.0 bool document.findTouchedLine(int startLine, bool down); Search for the next touched line starting at line. The search is performed either upwards or downwards depending on the search direction specified in down. Since: &kde; 5.0 int document.length(); Returns the number of characters in the document. int document.lineLength(int line); Returns the line's length. void document.editBegin(); Starts an edit group for undo/redo grouping. Make sure to always call editEnd() as often as you call editBegin(). Calling editBegin() internally uses a reference counter, &ie;, this call can be nested. void document.editEnd(); Ends an edit group. The last call of editEnd() (&ie; the one for the first call of editBegin()) finishes the edit step. int document.firstColumn(int line); Returns the first non-whitespace column in the given line. If there are only whitespaces in the line, the return value is -1. int document.lastColumn(int line); Returns the last non-whitespace column in the given line. If there are only whitespaces in the line, the return value is -1. int document.prevNonSpaceColumn(int line, int column); int document.prevNonSpaceColumn(Cursor cursor); Returns the column with a non-whitespace character starting at the given cursor position and searching backwards. int document.nextNonSpaceColumn(int line, int column); int document.nextNonSpaceColumn(Cursor cursor); Returns the column with a non-whitespace character starting at the given cursor position and searching forwards. int document.prevNonEmptyLine(int line); Returns the next non-empty line containing non-whitespace characters searching backwards. int document.nextNonEmptyLine(int line); Returns the next non-empty line containing non-whitespace characters searching forwards. bool document.isInWord(String character, int attribute); Returns true, if the given character with the given attribute can be part of a word, otherwise false. bool document.canBreakAt(String character, int attribute); Returns true, if the given character with the given attribute is suited to wrap a line, otherwise false. bool document.canComment(int startAttribute, int endAttribute); Returns true, if a range starting and ending with the given attributes is suited to be commented out, otherwise false. String document.commentMarker(int attribute); Returns the comment marker for single line comments for a given attribute. String document.commentStart(int attribute); Returns the comment marker for the start of multi-line comments for a given attribute. String document.commentEnd(int attribute); Returns the comment marker for the end of multi-line comments for a given attribute. Range document.documentRange(); Returns a range that encompasses the whole document. Cursor documentEnd(); Returns a cursor positioned at the last column of the last line in the document. bool isValidTextPosition(int line, int column); bool isValidTextPosition(Cursor cursor); Returns true, if the given cursor position is positioned at a valid text position. A text position is valid only if it locate at the start, in the middle, or the end of a valid line. Further, a text position is invalid if it is located in a Unicode surrogate. Since: &kde; 5.0 int document.attribute(int line, int column); int document.attribute(Cursor cursor); Returns the attribute at the given cursor position. bool document.isAttribute(int line, int column, int attribute); bool document.isAttribute(Cursor cursor, int attribute); Returns true, if the attribute at the given cursor position equals attribute, otherwise false. String document.attributeName(int line, int column); String document.attributeName(Cursor cursor); Returns the attribute name as human readable text. This is equal to the itemData name in the syntax highlighting files. bool document.isAttributeName(int line, int column, String name); bool document.isAttributeName(Cursor cursor, String name); Returns true, if the attribute name at a certain cursor position matches the given name, otherwise false. String document.variable(String key); Returns the value of the requested document variable key. If the document variable does not exist, the return value is an empty string. void document.setVariable(String key, String value); Set the value of the requested document variable key. See also: Kate document variables Since: &kde; 4.8 int document.firstVirtualColumn(int line); Returns the virtual column of the first non-whitespace character in the given line or -1, if the line is empty or contains only whitespace characters. int document.lastVirtualColumn(int line); Returns the virtual column of the last non-whitespace character in the given line or -1, if the line is empty or contains only whitespace characters. int document.toVirtualColumn(int line, int column); int document.toVirtualColumn(Cursor cursor); Cursor document.toVirtualCursor(Cursor cursor); Converts the given real cursor position to a virtual cursor position, either returning an int or a Cursor object. int document.fromVirtualColumn(int line, int virtualColumn); int document.fromVirtualColumn(Cursor virtualCursor); Cursor document.fromVirtualCursor(Cursor virtualCursor); Converts the given virtual cursor position to a real cursor position, either returning an int or a Cursor object. Cursor document.anchor(int line, int column, Char character); Cursor document.anchor(Cursor cursor, Char character); Searches backward for the given character starting from the given cursor. As an example, if '(' is passed as character, this function will return the position of the opening '('. This reference counting, &ie; other '(...)' are ignored. Cursor document.rfind(int line, int column, String text, int attribute = -1); Cursor document.rfind(Cursor cursor, String text, int attribute = -1); Find searching backwards the given text with the appropriate attribute. The argument attribute is ignored if it is set to -1. The returned cursor is invalid, if the text could not be found. int document.defStyleNum(int line, int column); int document.defStyleNum(Cursor cursor); Returns the default style used at the given cursor position. bool document.isCode(int line, int column); bool document.isCode(Cursor cursor); Returns true, if the attribute at the given cursor position is not equal to all of the following styles: dsComment, dsString, dsRegionMarker, dsChar, dsOthers. bool document.isComment(int line, int column); bool document.isComment(Cursor cursor); Returns true, if the attribute of the character at the cursor position is dsComment, otherwise false. bool document.isString(int line, int column); bool document.isString(Cursor cursor); Returns true, if the attribute of the character at the cursor position is dsString, otherwise false. bool document.isRegionMarker(int line, int column); bool document.isRegionMarker(Cursor cursor); Returns true, if the attribute of the character at the cursor position is dsRegionMarker, otherwise false. bool document.isChar(int line, int column); bool document.isChar(Cursor cursor); Returns true, if the attribute of the character at the cursor position is dsChar, otherwise false. bool document.isOthers(int line, int column); bool document.isOthers(Cursor cursor); Returns true, if the attribute of the character at the cursor position is dsOthers, otherwise false. The Editor API In addition to the document and view API, there is a general editor API that provides functions for general editor scripting functionality. String editor.clipboardText(); Returns the text that currently is in the global clipboard. Since: &kde-frameworks; 5.50 String editor.clipboardHistory(); The editor holds a clipboard history that contains up to 10 clipboard entries. This function returns all entries that currently are in the clipboard history. Since: &kde-frameworks; 5.50 void editor.setClipboardText(String text); Set the contents of the clipboard to text. The text will be added to the clipboard history. Since: &kde-frameworks; 5.50 diff --git a/doc/katepart/regular-expressions.docbook b/doc/katepart/regular-expressions.docbook index e15657758..cbe8e6912 100644 --- a/doc/katepart/regular-expressions.docbook +++ b/doc/katepart/regular-expressions.docbook @@ -1,740 +1,740 @@ &Anders.Lund; &Anders.Lund.mail; Regular Expressions This Appendix contains a brief but hopefully sufficient and covering introduction to the world of regular expressions. It documents regular expressions in the form available within &kappname;, which is not compatible with the regular expressions of perl, nor with those of for example grep. Introduction Regular Expressions provides us with a way to describe some possible contents of a text string in a way understood by a small piece of software, so that it can investigate if a text matches, and also in the case of advanced applications with the means of saving pieces or the matching text. An example: Say you want to search a text for paragraphs that starts with either of the names Henrik or Pernille followed by some form of the verb say. With a normal search, you would start out searching for the first name, Henrik maybe followed by sa like this: Henrik sa, and while looking for matches, you would have to discard those not being the beginning of a paragraph, as well as those in which the word starting with the letters sa was not either says, said or so. And then of course repeat all of that with the next name... With Regular Expressions, that task could be accomplished with a single search, and with a larger degree of preciseness. To achieve this, Regular Expressions defines rules for expressing in details a generalization of a string to match. Our example, which we might literally express like this: A line starting with either Henrik or Pernille (possibly following up to 4 blanks or tab characters) followed by a whitespace followed by sa and then either ys or id could be expressed with the following regular expression: ^[ \t]{0,4}(Henrik|Pernille) sa(ys|id) The above example demonstrates all four major concepts of modern Regular Expressions, namely: Patterns Assertions Quantifiers Back references The caret (^) starting the expression is an assertion, being true only if the following matching string is at the start of a line. The strings [ \t] and (Henrik|Pernille) sa(ys|id) are patterns. The first one is a character class that matches either a blank or a (horizontal) tab character; the other pattern contains first a subpattern matching either Henrik or Pernille, then a piece matching the exact string sa and finally a subpattern matching either ys or id The string {0,4} is a quantifier saying anywhere from 0 up to 4 of the previous. Because regular expression software supporting the concept of back references saves the entire matching part of the string as well as sub-patterns enclosed in parentheses, given some means of access to those references, we could get our hands on either the whole match (when searching a text document in an editor with a regular expression, that is often marked as selected) or either the name found, or the last part of the verb. All together, the expression will match where we wanted it to, and only there. The following sections will describe in details how to construct and use patterns, character classes, assertions, quantifiers and back references, and the final section will give a few useful examples. Patterns Patterns consists of literal strings and character classes. Patterns may contain sub-patterns, which are patterns enclosed in parentheses. Escaping characters In patterns as well as in character classes, some characters have a special meaning. To literally match any of those characters, they must be marked or escaped to let the regular expression software know that it should interpret such characters in their literal meaning. This is done by prepending the character with a backslash (\). The regular expression software will silently ignore escaping a character that does not have any special meaning in the context, so escaping for example a j (\j) is safe. If you are in doubt whether a character could have a special meaning, you can therefore escape it safely. Escaping of course includes the backslash character itself, to literally match a such, you would write \\. Character Classes and abbreviations A character class is an expression that matches one of a defined set of characters. In Regular Expressions, character classes are defined by putting the legal characters for the class in square brackets, [], or by using one of the abbreviated classes described below. Simple character classes just contains one or more literal characters, for example [abc] (matching either of the letters a, b or c) or [0123456789] (matching any digit). Because letters and digits have a logical order, you can abbreviate those by specifying ranges of them: [a-c] is equal to [abc] and [0-9] is equal to [0123456789]. Combining these constructs, for example [a-fynot1-38] is completely legal (the last one would match, of course, either of a,b,c,d, e,f,y,n,o,t, 1,2,3 or 8). As capital letters are different characters from their non-capital equivalents, to create a caseless character class matching a or b, in any case, you need to write it [aAbB]. It is of course possible to create a negative class matching as anything but To do so put a caret (^) at the beginning of the class: [^abc] will match any character but a, b or c. In addition to literal characters, some abbreviations are defined, making life still a bit easier: \a This matches the ASCII bell character (BEL, 0x07). \f This matches the ASCII form feed character (FF, 0x0C). \n This matches the ASCII line feed character (LF, 0x0A, Unix newline). \r This matches the ASCII carriage return character (CR, 0x0D). \t This matches the ASCII horizontal tab character (HT, 0x09). \v This matches the ASCII vertical tab character (VT, 0x0B). \xhhhh This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (&ie;, \zero ooo) matches the ASCII/Latin-1 character corresponding to the octal number ooo (between 0 and 0377). . (dot) This matches any character (including newline). \d This matches a digit. Equal to [0-9] \D This matches a non-digit. Equal to [^0-9] or [^\d] \s This matches a whitespace character. Practically equal to [ \t\n\r] \S This matches a non-whitespace. Practically equal to [^ \t\r\n], and equal to [^\s] \w Matches any word character - in this case any letter, digit or underscore. Equal to [a-zA-Z0-9_] \W Matches any non-word character - anything but letters, numbers or underscore. Equal to [^a-zA-Z0-9_] or [^\w] The POSIX notation of classes, [:<class name>:] are also supported. For example, [:digit:] is equivalent to \d, and [:space:] to \s. See the full list of POSIX character classes here. The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write [\w \.] Characters with special meanings inside character classes The following characters has a special meaning inside the [] character class construct, and must be escaped to be literally included in a class: ] Ends the character class. Must be escaped unless it is the very first character in the class (may follow an unescaped caret) ^ (caret) Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class. - (dash) Denotes a logical range. Must always be escaped within a character class. \ (backslash) The escape character. Must always be escaped. Alternatives: matching <quote>one of</quote> If you want to match one of a set of alternative patterns, you can separate those with | (vertical bar character). For example to find either John or Harry you would use an expression John|Harry. Sub Patterns Sub patterns are patterns enclosed in parentheses, and they have several uses in the world of regular expressions. Specifying alternatives You may use a sub pattern to group a set of alternatives within a larger pattern. The alternatives are separated by the character | (vertical bar). For example to match either of the words int, float or double, you could use the pattern int|float|double. If you only want to find one if it is followed by some whitespace and then some letters, put the alternatives inside a subpattern: (int|float|double)\s+\w+. Capturing matching text (back references) If you want to use a back reference, use a sub pattern (PATTERN) to have the desired part of the pattern remembered. To prevent the sub pattern from being remembered, use a non-capturing group (?:PATTERN). For example, if you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could write (\w+),\s*\1. The sub pattern \w+ would find a chunk of word characters, and the entire expression would match if those were followed by a comma, 0 or more whitespace and then an equal chunk of word characters. (The string \1 references the first sub pattern enclosed in parentheses) To avoid ambiguities with usage of \1 with some digits behind it (⪚ \12 can be 12th subpattern or just the first subpattern with 2) we use \{12} as syntax for multi-digit subpatterns. Examples: \{12}1 is use subpattern 12 \123 is use capture 1 then 23 as the normal text Lookahead Assertions A lookahead assertion is a sub pattern, starting with either ?= or ?!. For example to match the literal string Bill but only if not followed by Gates, you could use this expression: Bill(?! Gates). (This would find Bill Clinton as well as Billy the kid, but silently ignore the other matches.) Sub patterns used for assertions are not captured. See also Assertions Lookbehind Assertions A lookbehind assertion is a sub pattern, starting with either ?<= or ?<!. Lookbehind has the same effect as the lookahead, but works backwards. For example to match the literal string fruit but only if not preceded by grape, you could use this expression: (?<!grape)fruit. Sub patterns used for assertions are not captured. See also Assertions Characters with a special meaning inside patterns The following characters have meaning inside a pattern, and must be escaped if you want to literally match them: \ (backslash) The escape character. ^ (caret) Asserts the beginning of the string. $ Asserts the end of string. () (left and right parentheses) Denotes sub patterns. {} (left and right curly braces) Denotes numeric quantifiers. [] (left and right square brackets) Denotes character classes. | (vertical bar) logical OR. Separates alternatives. + (plus sign) Quantifier, 1 or more. * (asterisk) Quantifier, 0 or more. ? (question mark) An optional character. Can be interpreted as a quantifier, 0 or 1. Quantifiers Quantifiers allows a regular expression to match a specified number or range of numbers of either a character, character class or sub pattern. Quantifiers are enclosed in curly brackets ({ and }) and have the general form {[minimum-occurrences][,[maximum-occurrences]]} The usage is best explained by example: {1} Exactly 1 occurrence {0,1} Zero or 1 occurrences {,1} The same, with less work;) {5,10} At least 5 but maximum 10 occurrences. {5,} At least 5 occurrences, no maximum. Additionally, there are some abbreviations: * (asterisk) similar to {0,}, find any number of occurrences. + (plus sign) similar to {1,}, at least 1 occurrence. ? (question mark) similar to {0,1}, zero or 1 occurrence. Greed When using quantifiers with no maximum, regular expressions defaults to match as much of the searched string as possible, commonly known as greedy behavior. Modern regular expression software provides the means of turning off greediness, though in a graphical environment it is up to the interface to provide you with access to this feature. For example a search dialog providing a regular expression search could have a check box labeled Minimal matching as well as it ought to indicate if greediness is the default behavior. In context examples Here are a few examples of using quantifiers ^\d{4,5}\s Matches the digits in 1234 go and 12345 now, but neither in 567 eleven nor in 223459 somewhere \s+ Matches one or more whitespace characters (bla){1,} Matches all of blablabla and the bla in blackbird or tabla /?> Matches /> in <closeditem/> as well as > in <openitem>. Assertions Assertions allows a regular expression to match only under certain controlled conditions. An assertion does not need a character to match, it rather investigates the surroundings of a possible match before acknowledging it. For example the word boundary assertion does not try to find a non word character opposite a word one at its position, instead it makes sure that there is not a word character. This means that the assertion can match where there is no character, &ie; at the ends of a searched string. Some assertions actually do have a pattern to match, but the part of the string matching that will not be a part of the result of the match of the full expression. Regular Expressions as documented here supports the following assertions: ^ (caret: beginning of string) Matches the beginning of the searched string. The expression ^Peter will match at Peter in the string Peter, hey! but not in Hey, Peter! $ (end of string) Matches the end of the searched string. The expression you\?$ will match at the last you in the string You didn't do that, did you? but nowhere in You didn't do that, right? \b (word boundary) Matches if there is a word character at one side and not a word character at the other. This is useful to find word ends, for example both ends to find a whole word. The expression \bin\b will match at the separate in in the string He came in through the window, but not at the in in window. \B (non word boundary) Matches wherever \b does not. That means that it will match for example within words: The expression \Bin\B will match at in window but not in integer or I'm in love. (?=PATTERN) (Positive lookahead) A lookahead assertion looks at the part of the string following a possible match. The positive lookahead will prevent the string from matching if the text following the possible match does not match the PATTERN of the assertion, but the text matched by that will not be included in the result. The expression handy(?=\w) will match at handy in handyman but not in That came in handy! (?!PATTERN) (Negative lookahead) The negative lookahead prevents a possible match to be acknowledged if the following part of the searched string does match its PATTERN. The expression const \w+\b(?!\s*&) will match at const char in the string const char* foo while it can not match const QString in const QString& bar because the & matches the negative lookahead assertion pattern. (?<=PATTERN) (Positive lookbehind) Lookbehind has the same effect as the lookahead, but works backwards. A lookbehind looks at the part of the string previous a possible match. The positive lookbehind will match a string only if it is preceded by the PATTERN of the assertion, but the text matched by that will not be included in the result. -The expression (?<cup)cake will match at cake +The expression (?<=cup)cake will match at cake if it is succeeded by cup (in cupcake but not in cheesecake or in cake alone). (?<!PATTERN) (Negative lookbehind) The negative lookbehind prevents a possible match to be acknowledged if the previous part of the searched string does match its PATTERN. The expression (?<![\w\.])[0-9]+ will match at 123 in the strings =123 and -123 while it can not match 123 in .123 or word123. (PATTERN) (Capturing group) The sub pattern within the parentheses is captured and remembered, so that it can be used in back references. For example, the expression (&quot;+)[^&quot;]*\1 matches """"text"""" and "text". See the section Capturing matching text (back references) for more information. (?:PATTERN) (Non-capturing group) The sub pattern within the parentheses is not captured and is not remembered. It is preferable to always use non-capturing groups if the captures will not be used.