Add a tool for generating character width tables
ClosedPublic

Authored by mglb on Sep 26 2018, 12:55 AM.

Details

Summary

The uni2characterwidth tool, converts Unicode Character Database files
into character width lookup tables. It uses a template file to place
the tables in a source code file together with a function for finding
the width for specified character. It also allows to generate few forms
of lists with width data for debug and test purposes, or for future use
as a replacement of Unicode files.

Set KONSOLE_BUILD_UNI2CHARACTERWIDTH cmake flag to build the tool.
Use --help argument for more detailed usage.

There is a possibility to generate separate "width" for Ambiguous
characters. It can be used to add ability to configure the characters
width in Konsole settings.

The example.template file contains all possible named tags, and some
additional tags to show how to use them.

CCBUG: 396435

Depends on D15756

Test Plan

Download files listed below from 11.0.0 and emoji/11.0 directories
on https://unicode.org/Public/. You can also directly use URLs to the
files.

  • UnicodeData.txt
  • EastAsianWidth.txt
  • emoji-data.txt

Generate any available list except compact-ranges (e.g. details):

uni2characterwidth \
    -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
    -g details  result.txt

The list should contain ranges for all possible widths
(-2, -1, 0, 1, 2). You can choose some characters with a width you know
and check how they were classified. -2 is a special non-standard width
for ambiguous characters, which can be overriden by adding -a 1 or
-a 2 parameter. With this flag, all ranges from -2 group should
disappear and become assigned to selected width (1 or 2).

Generate output using a template:

uni2characterwidth \
    -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
    -g code,./template.example  result.txt

Diff Detail

Repository
R319 Konsole
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mglb requested review of this revision.Sep 26 2018, 12:55 AM
mglb created this revision.
mglb updated this revision to Diff 42338.Sep 26 2018, 12:59 AM

Fix template.example

mglb updated this revision to Diff 42339.Sep 26 2018, 1:03 AM

Add copyrights

mglb updated this revision to Diff 42399.Sep 26 2018, 9:52 PM

Language fix

This doesn't apply cleanly w/ 'arc patch D15757' - not sure if I need to do something special or if you need to rebase.

mglb updated this revision to Diff 42513.Sep 28 2018, 5:55 PM

git rebase master

hindenburg accepted this revision.Sep 30 2018, 4:13 PM

Thanks for taking the time to do this

This revision is now accepted and ready to land.Sep 30 2018, 4:13 PM
This revision was automatically updated to reflect the committed changes.