Change internal character type size to 32 bit
ClosedPublic

Authored by mglb on Apr 15 2018, 11:12 PM.

Details

Summary

Currently Unicode uses 0x10FFFF code points. It is possible to represent
all of them with up to two 16 bit values (UTF-16), but this makes it
harder to e.g. check for their width.

Some test cases were changed. Originally they were added with an
assumption that the code point will be truncated to 16 bit value and
as a result changed to another code point.

Test Plan
  • All code points <= 0xFFFF should work as before
    • Start the same tmux session in two Konsoles
    • Change background to fully transparent in one of them and put it in front of the other one, so that all text will overlap
    • Generate characters with (you can pipe it to fold -s | less -r):
perl -XCSDL -e 'print map{chr($_), " "} 1..0xffff'
  • Compare output visually.
  • Code points > 0xFFFF should not be truncated to 16 bits
    • "๐€" and "ํ€" should be different characters
  • Some code points > 0xFFFF should have single width
    • Vertical lines below should align:
|๐€ |
|๐ €‹|

Diff Detail

Repository
R319 Konsole
Branch
arc/Change-internal-character-type-size-to-32-bit
Lint
No Linters Available
Unit
No Unit Test Coverage
mglb created this revision.Apr 15 2018, 11:12 PM
Restricted Application added a project: Konsole. ยท View Herald TranscriptApr 15 2018, 11:12 PM
mglb requested review of this revision.Apr 15 2018, 11:12 PM
mglb edited the test plan for this revision. (Show Details)Apr 15 2018, 11:25 PM
mglb updated this revision to Diff 32342.Apr 16 2018, 11:09 PM
mglb edited the test plan for this revision. (Show Details)

Fix 32 bit support in history

OK, I'm looking at this - a lot of changes

Do you have the ability to compile using ASAN? The below is from TerminalInterfaceTest

10952==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f00001cf20 at pc 0x7effe8d706ef bp 0x7ffd443897f0 sp 0x7ffd443897e8

READ of size 4 at 0x62f00001cf20 thread T0

#0 0x7effe8d706ee in Konsole::TerminalDisplay::updateCursor() /home/kurthindenburg/Devel/KDE/src/kde/applications/konsole/src/TerminalDisplay.cpp:1874:60
#1 0x7effe8d3d1d4 in Konsole::TerminalDisplay::blinkCursorEvent() /home/kurthindenburg/Devel/KDE/src/kde/applications/konsole/src/TerminalDisplay.cpp:1868:
mglb added a comment.Apr 19 2018, 10:59 PM

I did that, all tests pass without significant errors (only memory leaks). I am using semi-clean environment (empty directory as $HOME and bash as $SHELL). For this test I've used default configuration + blinking cursor.

$ cmake -DCMAKE_BUILD_TYPE=Debug -DECM_ENABLE_SANITIZERS=address -DBUILD_TESTING=ON ../ && make
...
$ SHELL=/bin/bash HOME=fakehome ./src/autotests/TerminalInterfaceTest

********* Start testing of Konsole::TerminalInterfaceTest *********
Config: Using QtTest library 5.9.1, Qt 5.9.1 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 7.2.0)
PASS   : Konsole::TerminalInterfaceTest::initTestCase()
PASS   : Konsole::TerminalInterfaceTest::testTerminalInterfaceNoShell()
PASS   : Konsole::TerminalInterfaceTest::testTerminalInterface()
PASS   : Konsole::TerminalInterfaceTest::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped, 0 blacklisted, 8237ms
********* Finished testing of Konsole::TerminalInterfaceTest *********

=================================================================
==22050==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 256 byte(s) in 1 object(s) allocated from:
    #0 0x7fd04c2bdb50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
    #1 0x7fd0391c5099  (/usr/lib/x86_64-linux-gnu/libfontconfig.so.1+0x1d099)

...

Can you test it with clean config + blinking? If it will work without errors, can you share the profile used by the test?

OK, more testing on TerminalInterfaceTest using zsh

  1. no crash on any non-blinking cursor
  2. crash on all 3 blinking cursor

If you can't reproduce, I'll test more

mglb added a comment.Apr 20 2018, 11:45 PM

Clang, GCC, empty profile, my usual profile, adding qDebugs to track cursor position and array size, etc - always works.

I've tried to trigger an overflow in updateCursor() using demo_konsolepart, and I found a way - slowly decrease window height. At some point below one terminal line height ASAN reports either buffer overflow or use after free. Sometimes (short prompt and nothing else in terminal) I am able to decrease height to 0, but then generating longer lines does the job. Do not test this with long text typed into zsh - it is smart and shortens command line so it won't wrap when there is too few lines available.

The problem is... the bug works on master. So I guess this is another one. I'll try to fix it anyway.

hindenburg added a comment.EditedApr 21 2018, 12:48 AM

OK on a fresh konsole, TerminalInterfaceTest does not crash. It appears it is zsh+powerline in the default profile that causes it.

Add '. /usr/share/powerline/bindings/zsh/powerline.zsh' to the end of .zshrc and have zsh has profile command.

I'll track it down to exactly what crashes.

mglb added a comment.Apr 26 2018, 10:02 PM

This didn't crashed too. Maybe (hidden) TerminalDisplay had too small size somehow? This is the only way I was able to trigger the bug in the test.
Anyway, please check:
https://phabricator.kde.org/D12551
This will fix the problem.

mglb updated this revision to Diff 34163.May 14 2018, 10:52 PM

Update to master

Restricted Application added a subscriber: konsole-devel. ยท View Herald TranscriptMay 14 2018, 10:52 PM

OK I don't see anything obviously wrong; I think we should put this in master as soon as possible to get feedback. I assume this will increase memory/scrollback file size.

The GLASS.utf file in tests shows the Gothic (4) line is a lot different

mglb updated this revision to Diff 34233.May 15 2018, 10:00 PM

Use characters which work with konsole_wcwidth in a test case.

mglb edited the test plan for this revision. (Show Details)May 15 2018, 10:01 PM
mglb added a comment.EditedMay 15 2018, 10:09 PM

I assume this will increase memory/scrollback file size.

It shouldn't - Character class size was 14 bytes (aligned padded to 16 bytes). Now it has 16 bytes.

The GLASS.utf file in tests shows the Gothic (4) line is a lot different

It was assumed before that everything above 0xffff is wide. This is true for most common characters in this range (Chinese, emoji). Gothic always should be narrow, and now it is.
However, konsole_wcwidth is not the best thing in those ranges - the second patch replaces it with a libc's wcwidth.

hindenburg accepted this revision.May 16 2018, 2:03 PM

OK, it is worth testing this alone or should the change to use wcwidth be done at the same time?

This revision is now accepted and ready to land.May 16 2018, 2:03 PM
This revision was automatically updated to reflect the committed changes.
mglb added a comment.May 16 2018, 11:14 PM

Two patches together would be better. I think there might be some bugreports about a character width being wrong, which most likely would be caused by konsole_wcwidth (e.g. emoji).