add Korean Hangul jamo code point ranges
Needs ReviewPublic

Authored by daehyuns on Jun 3 2019, 4:41 AM.

Details

Summary

add Korean Hangul jamo code point ranges.

U+1100 - U+11FF - Hangul Jamo
U+3130 - U+318F - Hangul Compatibility Jamo
U+A960 - U+A97F: Hangul Jamo Extended-A
U+D7B0 - U+D7FF: Hangul Jamo Extended-B

It related on https://bugs.kde.org/show_bug.cgi?id=408231

all Korean code point ranges

U+1100 - U+11FF: Hangul Jamo
U+A960 - U+A97F: Hangul Jamo Extended-A
U+D7B0 - U+D7FF: Hangul Jamo Extended-B
U+3130 - U+318F: Hangul Compatibility Jamo
U+AC00 - U+D7AF: Hangul Syllables

Below is Korean Hangul Jamo and Syllables code point ranges on Unicode Consortium

Hangul Jamo (Range: U+1100 - U+11FF)
http://www.unicode.org/charts/PDF/U1100.pdf
Hangul Jamo Extended-A (Range: U+A960 - U+A97F)
http://www.unicode.org/charts/PDF/UA960.pdf
Hangul Jamo Extended-B (Range: U+D7B0 - U+D7FF)
http://www.unicode.org/charts/PDF/UD7B0.pdf
Hangul Compatibility Jamo (Range: U+3130 - U+318F)
http://www.unicode.org/charts/PDF/U3130.pdf
Hangul Syllables (Range: U+AC00 - U+D7AF)
http://www.unicode.org/charts/PDF/UAC00.pdf

Diff Detail

Repository
R8 Calligra
Lint
Lint Skipped
Unit
Unit Tests Skipped
daehyuns requested review of this revision.Jun 3 2019, 4:41 AM
daehyuns created this revision.

First of all, thanks for the work. Some things are passing through my mind, especially regarding classical Hangul and half-completed Hangul characters.

  1. U+AC00 .. U+D7AF - no problems in mapping a single QChar to one Hangul character
  2. U+3130 .. U+318F - same as Hangul Syllables block (single QChar to one Hangul character) as characters in this range are non-combining
  3. Hangul Jamo, Hangul Jamo Extended-A, B (U+1100.. U+11FF, U+A960 .. U+A97F, U+D7B0 .. U+D7FF) - here is the tricky part, as what users will see as a single "Hangul character" is not always a single "QChar".

Let's take an example of '나랏말ᄊᆞ미'. The 'ᄊᆞ' part may be seen as a single character if the rendering font combines U+110A and U+119E. This and other classical Hangul characters can't be "normalized" into a single Unicode code point/QChar, so as half-completed characters (cho+jong, jung+jong). If the underlying font is not combining those two (e.g. the font is not supporting classical Hangul) then users will think that as two separate characters, otherwise one single character. If we can get the font information here then the statistics may follow how the font is rendering these characters (two or one). If not, KS X 1026-1 [1] could be used as a guideline on determining the boundary of a single character.

Have you checked how other word processors are handling this issue? We can also build some test cases around this too.

[1] http://www.unicode.org/L2/L2008/08225-n3422.pdf