Add a new ISO 3166 code to country name mapping table
ClosedPublic

Authored by vkrause on Aug 30 2018, 6:07 PM.

Details

Summary

This finally replaces the last aspect of the outdated and extremely slow
to parse countrytransl.map, and now uses the strings from the iso-codes
package that we also use for country name translations.

Diff Detail

Repository
R174 KContacts
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
vkrause created this revision.Aug 30 2018, 6:07 PM
Restricted Application added a project: KDE PIM. · View Herald TranscriptAug 30 2018, 6:07 PM
Restricted Application added a subscriber: kde-pim. · View Herald Transcript
vkrause requested review of this revision.Aug 30 2018, 6:07 PM

Looks good to me. I haven't tested that the tool and the generated output matches, but I trust volker in that.

mlaurent accepted this revision.Aug 31 2018, 6:17 AM
mlaurent added a subscriber: mlaurent.

I trust you too :)
+2

This revision is now accepted and ready to land.Aug 31 2018, 6:17 AM
pino added a subscriber: pino.Aug 31 2018, 6:36 AM

Please lookup the strings at runtime, instead of embedding them in the library.
Otherwise, an update to iso-codes (like updating the name of a country) will break the mapping, since the installed translations will not reflect the old text embedded.

pino requested changes to this revision.Aug 31 2018, 6:36 AM
This revision now requires changes to proceed.Aug 31 2018, 6:36 AM
In D15173#317976, @pino wrote:

Please lookup the strings at runtime, instead of embedding them in the library.
Otherwise, an update to iso-codes (like updating the name of a country) will break the mapping, since the installed translations will not reflect the old text embedded.

Yes, there are downsides with this approach. The reason it's done that way is that the old runtime lookup was way too slow (it's the reason I started looking into this code in the first place, it was dominating runtime cost when scrolling the KDE Itinerary timeline and when benchmarking data extraction, and that includes heavy PDF parsing...). The alternative would be a single load and keep all of the data in per-process memory to mitigate most of that, compared of this now being in shared read-only data. This function is potentially called in models (the worst case might be the locale-aware sorted country name combo box in KDE Itinerary, opening the page containing this takes a few seconds on my phone), so performance matters here.

IMHO this approach is preferable, as country name change intervals tend to be larger than our release cycles, and therefore don't justify the runtime performance and memory trade-offs.

pino added a comment.Aug 31 2018, 7:30 AM

I don't see how loading it once per process would be a big issue, compared to embedding data in the library.
Also, the proposed approach here creates another kind of static data in the sources (isotocountrymap_data.cpp); considering it is not automatically generated, this means that it will be hardly updated by anyone, and happily rot in the sources. Just look at the existing countrytransl.map, for example.

So all it matters here is performance? Sigh...

In D15173#318021, @pino wrote:

I don't see how loading it once per process would be a big issue, compared to embedding data in the library.
Also, the proposed approach here creates another kind of static data in the sources (isotocountrymap_data.cpp); considering it is not automatically generated, this means that it will be hardly updated by anyone, and happily rot in the sources. Just look at the existing countrytransl.map, for example.

Ok, that's something we can change probably, I followed the reverse direction based on the giant CLDR dataset here too closely maybe. We could make iso-codes a hard build-time requirement and generate this file every time, to address this concern. The CI has it on all platforms but Windows already it seems. Would that be acceptable?

pino resigned from this revision.Aug 31 2018, 8:03 AM

We could make iso-codes a hard build-time requirement and generate this file every time, to address this concern. The CI has it on all platforms but Windows already it seems. Would that be acceptable?

Considering what I said earlier, it should be easy to imagine what my reply is ("no").

Also, D14163: Fix translation of country names was committed even if I had concerns on it; hence, I'm not sure it is even worth to argue here...

This revision is now accepted and ready to land.Aug 31 2018, 8:03 AM
In D15173#318027, @pino wrote:

We could make iso-codes a hard build-time requirement and generate this file every time, to address this concern. The CI has it on all platforms but Windows already it seems. Would that be acceptable?

Considering what I said earlier, it should be easy to imagine what my reply is ("no").

The price for runtime loading is:

  • ~36kb of JSON we need to (partly) keep in non-shared heap memory vs. ~2kb of shared read-only data
  • some one time parsing cost which is probably irrelevant on desktop, and probably cost us ~100ms on mobile
  • addtional deployment of the iso-codes data file, again irrelevant on Linux desktops, probably also on Windows, but needing extra care on Android for example (or qrc bundling). Failures in deployment result in no country names being displayed at all.

The price for compiled-in data is (assuming we change this to be updated on each build):

  • we might be out of sync with translations for country name changes within a release cycle, country names would be shown in the old form and without translations in that case.

Obviously we disagree on how to value those trade-offs, so how to we get to a decision here?

Also, D14163: Fix translation of country names was committed even if I had concerns on it; hence, I'm not sure it is even worth to argue here...

For the record, D14163 was committed after 4+ weeks of inactivity from you in the discussion, with 2 people (besides me) speaking out in favor of the patch and it blocking D14563 (which addressed the other direction of this, country to iso). Additionally this patch here is attempting to address the remaining issues you pointed out in D14163 by moving to the iso-codes dataset (even if we still disagree on how well it does that).

In all of this, let's also not forget the 18.08 baseline, where we have no working country name translations and are ~15 years out of touch with reality, all of which at a user-noticeable performance cost. I think all these patches improved that situation considerably, but there might of course still be some room for perfection ;-)

This revision was automatically updated to reflect the committed changes.