Sonnet: Add locales that are missing in QLocale
AbandonedPublic

Authored by waqar on Nov 25 2019, 4:46 PM.

Details

Reviewers
None
Summary

QLocale is missing some locales and as a result the file name of the dictionaries of these locales are used in menus instead of proper language name and/or country name which looks very ugly. An example is the dictionary with file name: "an_ES". If you use this dictionary, every time you see the menu, you will see 'an_ES' instead of 'aragonés (España)' the proper name of the language.

Diff Detail

Repository
R246 Sonnet
Branch
add-missing-locales (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 19156
Build 19174: arc lint + arc unit
waqar created this revision.Nov 25 2019, 4:46 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptNov 25 2019, 4:46 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
waqar requested review of this revision.Nov 25 2019, 4:46 PM
yurchor added inline comments.
src/core/loader.cpp
255

This should be "Россия" (without the accent).

261

Same here

273

Same here

waqar updated this revision to Diff 70310.Nov 25 2019, 6:06 PM

Remove accent from Росси́я

aacid added a subscriber: aacid.Nov 25 2019, 8:59 PM

Any reason we particularly need this here instead of improving Qt?

waqar marked 3 inline comments as done.EditedNov 26 2019, 4:31 AM

Any reason we particularly need this here instead of improving Qt?

Improving Qt is definitely the better and correct way. But in the meanwhile this can be temporarily fixed here and removed later when QLocale has all the ISO codes.

There are a LOT more codes that aren't present in QLocale yet. I have just added the ones whose dictionaries I was able to find.

aspotashev added inline comments.
src/core/loader.cpp
255

Is there any rule behind your decision to spell some language names capitalized and some others not capitalized?

Examples:

  • Чӑвашла - first letter is capital
  • коми - all letters in lowercase

I didn't find any indication that these languages might have a rule to capitalize language names like English has.

waqar added inline comments.Nov 26 2019, 6:59 AM
src/core/loader.cpp
255

I used wikipedia and the description files provided with the dictionaries for this information. It may not be correct however. If you know the language or the rules, please feel free to correct me and I will amend the commit accordingly

aspotashev added inline comments.Nov 26 2019, 1:32 PM
src/core/loader.cpp
255

There may be different consistent approaches to capitalization:

  1. Use the spelling like if the language name comes in the middle of a sentence. This implies all-lowercase spelling for many Cyrillic-based languages, e.g. "русский" for Russian.
  2. Use the spelling like if the language name comes in the beginning of a sentence. In many languages that means that first letter is in uppercase, e.g. "Русский" for Russian.

I don't know which of these approaches is taken by Qt, but we should follow the pattern. Until then, I can't say how to fix this patch.


After having a look on the Qt source code (qlocale_unix.cpp, qlocale_win.cpp) I'm not sure where do these native language names from on Unix/Linux. QSystemLocale::query() as defined in qlocale_unix.cpp seems to returns an empty QVariant.

I think it would be really helpful if we add unit tests for this Loader::languageNameForCode() method.


I don't know these languages, however I can read Russian and at least detect capital letters in languages that mostly reuse the Russian alphabet.

Now so sure about Chuvash language anymore because English Wikipedia says "Чӑвашла" while Russian
Wikipedia and many other resources in Russian say "чӑвашла", e.g. https://en.wiktionary.org/wiki/чӑваш_чӗлхи

waqar added inline comments.Nov 26 2019, 3:37 PM
src/core/loader.cpp
255
QLocale::nativeCountryName()
QLocale::nativeLanguageName()

Examples for cyrillic based using these two methods:

  • "українська" "Україна" for uk_UA (Ukrainian)
  • "русский" "Россия" for ru_RU (Russian)
  • српски" "Србија" for sr_SR (Serbian)

The language names are coming from qgetenv() I believe. It returns empty in case there was no language name found. In that case, Sonnet just returns the 'isoCode' e.g cv_RU.

Great idea about the languageNameForCode() test. I will go ahead and write it. What should be the name for this test?

aacid added a comment.Nov 26 2019, 9:17 PM

Any reason we particularly need this here instead of improving Qt?

Improving Qt is definitely the better and correct way. But in the meanwhile this can be temporarily fixed here and removed later when QLocale has all the ISO codes.

There are a LOT more codes that aren't present in QLocale yet. I have just added the ones whose dictionaries I was able to find.

Sorry but i don't think this is an acceptable solution, there's lots of other places in KDE code we use QLocale to get the name of a language, if you do this here you'll have to do the same in like 5 places.

Instead just fix it in the proper place

waqar added a comment.Nov 27 2019, 5:06 AM

Sorry but i don't think this is an acceptable solution,

With all due respect, the end user doesn't give a crap whether it was a problem in QLocale, or whether it needed to be done in many places in KDE code and that nobody did it. The end user opens the menu, sees 'American English (United States)' and below that 'cv_RU' and he is just going to think the developers didn't fix it.

there's lots of other places in KDE code we use QLocale to get the name of a language, if you do this here you'll have to do the same in like 5 places.

Alright lets do it in all the places. I will do it.

Instead just fix it in the proper place

What would be the proper place? Qt Code? I am not sure about that. Maybe this doesn't happen on other operating systems. Maybe it's an OS issue.(I don't have other operating systems to test on) Even if it is in Qt Code, improving Qt is great and beneficial for everyone, but its gonna take some time, 6 months? maybe more. In the meantime, this may not be the cleanest and the best solution but it is something and something is better than nothing.

waqar updated this revision to Diff 70399.Nov 27 2019, 8:32 AM

Add test for iso code to language names

aacid added a comment.Nov 27 2019, 6:35 PM

the end user doesn't give a crap whether it was a problem in QLocale, or whether it needed to be done in many places in KDE code and that nobody did it. The end user opens the menu, sees 'American English (United States)' and below that 'cv_RU' and he is just going to think the developers didn't fix it.

With no respect for your lack of being subtle, we need to maintain this for 20 years in the future, you making a quick hack here doesn't make this any better, the user will still see cv_RU in random places and will be all annoyed anyway.

Now go and make the fix in Qt, if it gets rejected, we can speak about doing dirty fixes.

waqar added a comment.EditedNov 28 2019, 7:25 AM

With no respect for your lack of being subtle

I meant no disrespect to anyone. If you felt offended I apologize

we need to maintain this for 20 years in the future, you making a quick hack here doesn't make this any better, the user will still see cv_RU in random places and will be all annoyed anyway.

Can you give examples of places where an iso_code (cv_RU) could pop up?

Spellchecker is a frequently accessed tool. It can have more languages than any other tool. The user can just place a dictionary in the directory and it is the job of spellchecker to load those dictionaries and their filenames correctly.

Now go and make the fix in Qt

I looked deeper into this to figure out what's the issue. The relevant data is stored in qlocale_data_p.h, it does have most of the iso_codes (even the ones that are being displayed incorrectly). What it doesn't have is native language names.

So, if I do,

QLocale locale("cv");
QLocale::Language l = locale.language();
QString localizedLangName = locale.languageToString(l);

It shows Chuvash as the language name.

Now, qlocale_data_p.h is a generated file. You can't edit it manually. Qt takes the data from CLDR. And CLDR doesn't have the native names for these languages. CLDR is a huge database of locale stuff, and it is not very easy to contribute data if you aren't a native speaker.

This patch is a fallback mechanism for when QLocale fails to get the native language name.

aacid added a comment.Nov 28 2019, 9:30 PM

Can you give examples of places where an iso_code (cv_RU) could pop up?

KLanguageButton does iso code to lang name translation
https://api.kde.org/frameworks/kconfigwidgets/html/classKLanguageButton.html

Any other place in KDE code using QLocale::nativeLanguageName https://lxr.kde.org/search?_filestring=&_string=nativeLanguageName or languageToString https://lxr.kde.org/search?_filestring=&_string=languageToString

I am with Albert, that we better improve what Qt has instead of replicating this here.

waqar added a comment.Nov 29 2019, 5:14 PM

Yes, I think so too.

@aacid thank you for your patience and guidance.

Should I withdraw this or wait for further review?

aacid added a comment.Nov 30 2019, 9:57 AM

I honestly would prefer if you abandon this, but maybe let it open for a few days in case someone else disagrees.

Thanks for your understanding :)

Please shout if you need help with improving QLocale

This comment was removed by waqar.
waqar abandoned this revision.Dec 16 2019, 6:30 PM