KLocale porting
Open, Needs TriagePublic

Description

There seems to be quite a few users of KLocale functions that do not have equivalents outside KDELibs4Support. E.g. countryCodeToName (https://api.kde.org/frameworks/kdelibs4support/html/classKLocale.html#a0ecb21950b7e94a3cb7f1ab256019de1) to get Country X in Language Y (Qt offers Country X in Language X or Country X in English).

iso-codes project seems to have necessary data in json format. KContacts already use some of it (it contains a generator that reads json and produces .cpp file with data that is pre-committed to repository). But that's not very scalable...

See also: https://api.kde.org/frameworks/kconfigwidgets/html/classKLanguageName.html

stikonas created this task.Dec 27 2019, 1:10 AM

It might make sense to collect all those iso-codes or CLDR consuming functions in e.g. KI18n and share them there? Whether that's then implemented by runtime reading iso-codes data, or compiling in the subset we need is secondary IMHO. KContacts is also only a rather accidental place for this I think.

stikonas added a subscriber: adridg.EditedDec 27 2019, 11:53 AM

It might make sense to collect all those iso-codes or CLDR consuming functions in e.g. KI18n and share them there? Whether that's then implemented by runtime reading iso-codes data, or compiling in the subset we need is secondary IMHO. KContacts is also only a rather accidental place for this I think.

Yes, I agree that KI18n is a natural place for this.

I'm not against compiling in the subset that we need as long as it's done in a single place and does not get too much out of date. But indeed this is a secondary question.

I guess the first thing is to decide what data we need.

  • Country names
  • Language names? (KLanguageName is part of KConfigWidgets but maybe it is better to keep everything in one place)
  • Timezone names? (@adridg was interested in this, possibly needed by plasma timezone selector too)

Anything else?

KContacts (and KItinerary, which heavily uses this) does need mapping of ISO 3166-1 alpha 2 codes to localized country names, and more difficult, the opposite: mapping from localized country names (and possibly not written in perfect unicode but some translitarion) to ISO 3166-1 alpha 2 codes. The first can be done at runtime from iso-codes, the latter needs CLDR. In both cases performance matters a lot (this is used eg. in list views), so we need to be a bit more clever than parsing JSON/XML at every call.

For KItinerary ISO 3166-2 region code mapping in both ways is also of interest eventually.

Based on input from here, previous discussions and what I could find in existing code, here's a first draft for an API proposal for KI18n: https://invent.kde.org/-/snippets/1525

The data sources for this would be iso-codes, CLDR, timezone and ISO 3166-1/2 border extracts from OSM and QLocale. The only thing not covered by those I think are translated timezone names. How exactly those should look like is an open question anyway, we have multiple variations of those right now.

What this doesn't cover are translated city names and location to city mappings as used e.g. by Koko or KStars.

Regarding an actual implementation, most of this exists in various forms elsewhere already. Parts of this will involve some form of generated data tables that will need updates occasionally, such as the location-based lookups (see kitinerary) and the name to country code mapping (see kcontact), likely increasing the library size by a few MB of read-only/shareable data. The iso-codes translation catalogs would become a runtime dependency.

What do you think?

aacid added a subscriber: aacid.Feb 17 2021, 5:05 PM

Isn't KLanguageName what you want? https://api.kde.org/frameworks/kconfigwidgets/html/klanguagename_8h_source.html

Any idea what's wrong with the docu format? How do i get stuff to show up at https://api.kde.org/frameworks/kconfigwidgets/html/classKLanguageName.html ?

aacid added a comment.Feb 17 2021, 5:06 PM

Ah i see that @stikonas had already mentioned it, sorry for the noise ^_^

Right, for language names the situation is already much better than for much of the rest. This would just be moving that functionality to ki18n/tier1, and switch to the iso-codes translation catalogs instead of having to maintain our own.

ervin moved this task from Needs Input to In Discussion on the KF6 board.Mar 28 2021, 9:12 AM
ervin moved this task from In Discussion to Backlog on the KF6 board.Mar 28 2021, 10:07 AM

Meeting notes from the KF6 sprint (thanks to Luigi):

Problem: duplicate locale/country/currencies information is duplicated .Volker started collecting all those information and their use cases.

Current proposal: new API to be added ki18n (still not set in stone). Need feedback on how much this is complete, and where to put it.

Also goal: avoid to translate everything so use iso-codes (country, country subdivision)

Some discussions (on general questions and technical details):

  • Open question about country: is it possible to get translated country names for the non-current locale?
  • Same for languages? Right now we have: get language name in current language, get language name in its own language
  • Both are not critical, but they can be added.

Timezones:

  • how to represent them? Separately translate continent/country? (still open)
  • the geocoordinate db to find the timezone automatically requires ~2MB more
  • what to use in the constructor? Probably QTimeZone

Why to put it in ki18n?

  • It needs ki18n in the implementation
  • if it's not in ki18n it bumps the tier of other frameworks (mainly KContacts). KContacts would need to be adapted, for example by using i18n directly without ki18n (but that would introduce duplication), or just bumped to an higher tier (possible "advertising" issue)?
  • On the other hand, the scope of ki18n can be changed
  • some stuff are related to translations anyway, and to the general "internationalization" scope
  • -> mail to Chusslove, ask for feedback

Could the runtime data increase, apart from those geocoding data?

  • other current requirement: iso-codes (but they are already a dependency)
  • city-based lookup which is missing (koko, kweather, kstars) and it may increase the size. Maybe it would make sense to get those city names from wikidata

Is iso-codes complete?

  • it is already a dependency for Plasma and Frameworks
  • in some countries something is lagging behind, but nothing major
  • KDE translators are already contributing

Two big questions:

  1. What is the chance this gets implemented & released in KF5, before KF6 is ready?
  2. Would the languages/locales' dataset be sufficient to replace the list of KDE languages currently hardcoded in KF5::KConfigWidgets? See thread for more background, please consider the case of en_GB.

The parts I have implemented are those dealing with country and country subdivision translations, as well as country/country subdivision and timezone lookups. It's still in a branch but given feedback/review it should be mergable for KF5.
What's still missing is:
(1) Language/locale translations.
(2) Human readable timezone names, and translations of those.
(3) Country/country subdivision to language(s) mapping.

For (1) and (2) I'm stuck on uncertainties regarding the requirements/desired outcome, (3) depends on implementation details of (1).

The main problem with (1), and you mentioned that in the referenced thread as well is that we mix several concepts under the term "language":

  • ISO 639: a language regardless of regional or script variants, "en"/"English" or "sr"/"Serbian"
  • locale: a combination of language, script and country, "en_US"/"American English" or "sr@latin"/"Serbian (Latin script)"
  • the subset of locales we actually have available translation catalogs for

The latter is actually the most often used one, for UI allowing to select a language.

For the human readable form of the locales, there's two possible ways:

  • string puzzle them from the individual translations of language, script and country ("English (American)"). This could be done based on existing translations in iso-codes.
  • translate the combined name in order to be able to support things like "American English". I'm not aware of an existing dataset for this, so we'd need to keep maintaining our own.

Just throwing one thing out there: sr@latin seems to be a KDE-ism, and Qt either doesn't represent it, or uses sr@latn as code-name. There's also complications with zh, apparently, where I get (outside of KDE) contributions where there's a country / region attached (e.g. zh_HK) but the meaning is script (Simplified vs Traditional Chinese), not country. So puzzling things together might usually work, there's a handful of relevant edge cases.

bam added a subscriber: bam.Nov 7 2021, 1:00 PM

@vkrause can we consider this Done?

vkrause added a comment.EditedFeb 19 2023, 10:11 AM

Country names are done I think, which was the original report. Things that were mentioned during the discussion (language names, timezone names, country to language mappings) are not.

Not a blocker for KF6 though, this would be new API anyway.

vkrause moved this task from Backlog to Optional/Low Priority on the KF6 board.Feb 19 2023, 3:50 PM