Use new character width code based on Unicode 11
ClosedPublic

Authored by mglb on Sep 26 2018, 12:56 AM.

Details

Summary

Adds a code for getting character width togeter with LUTs generated
using uni2characterwidth from Unicode 11 lists.

Skin tone, flags, gender, and other emoji with and modifer are not
joined (you will see e.g. a skin tone square + generic yellow emoji).
I think joining them would cause problems in most editors, command line
prompts, and other programs which use character width data, as the
characters would behave as combining or emoji depending on context (like
ligatures).

Examples:

  • light thumb up: ๐Ÿ‘๐Ÿป
  • dark thumb up: ๐Ÿ‘๐Ÿฟ
  • Polish flag: ๐Ÿ‡ต๐Ÿ‡ฑ

This behavior is allowed:

It is possible to add support for sequences, but those would work
only for a string width functions.

Some characters which can be presented as emoji are narrow (e.g. โœ–๏ธ, ยฉ๏ธ).
Those characters are listed without "presentation" mode, which means
they should be rendered as text by default (real presentation depends on
renderer and/or font). Noto Sans Color Emoji renders them as wide,
DejaVu Sans as narrow. Vim, bash and zsh treat them as narrow, so I made
them narrow.

https://unicode.org/reports/tr51/#Presentation_Style

BUG: 396435
BUG: 378124
BUG: 392171
BUG: 339439

FIXED-IN: 18.12

Depends on D15757

Test Plan
  • Look at emoji_test.txt - emojis should look "normal" (two characters

width).

  • Look at GLASS.txt - characters width should look correct.
  • CharacterWidthTest should pass.
  • perl -XCSDL -e 'print map{chr($_), " "} 1..0xffff'

Diff Detail

Repository
R319 Konsole
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mglb requested review of this revision.Sep 26 2018, 12:56 AM
mglb created this revision.

You won't get any VDG objection from something as cool as this!

broulik added inline comments.
src/CharacterWidth.cpp
24

This is a generated file, or This file is generated.

src/CharacterWidth.src.cpp
5

What if someone else re-generates the file/updates it?

mglb added inline comments.Sep 26 2018, 9:46 PM
src/CharacterWidth.src.cpp
5

Regeneration using other source files will change some numbers in the arrays. This is the same as changing constants or something like that in C++ code, so the same policy applies.

mglb updated this revision to Diff 42398.Sep 26 2018, 9:47 PM

Language fix

mglb marked an inline comment as done.Sep 26 2018, 9:48 PM
mglb updated this revision to Diff 42514.Sep 28 2018, 5:57 PM

git rebase arc/396435/Add-a-tool-for-generating-character-width-tables

This needs a rebase as well

mglb updated this revision to Diff 42674.Oct 1 2018, 3:33 PM

git rebase master

mglb updated this revision to Diff 42676.Oct 1 2018, 3:39 PM

Set upstream to master

Thanks, I don't see anything obviously wrong; let me test it a bit more and we'll get it into master for more testing.

hindenburg edited the summary of this revision. (Show Details)Oct 3 2018, 3:03 PM
hindenburg edited the test plan for this revision. (Show Details)Oct 3 2018, 3:05 PM
hindenburg accepted this revision.Oct 3 2018, 3:11 PM
This revision is now accepted and ready to land.Oct 3 2018, 3:11 PM
This revision was automatically updated to reflect the committed changes.