Without increasing the number of ambiguities this makes the country
detection a bit more robust and shrinks the database by about 50kB/10%.
While stripping non-letters and diacritic marks in Latin is fairly
straightforward and predictable, the results were less helpful in e.g.
Hangul, therefore the fairly fine-grained approach.
This requires the country name lookup table to be regenerated, which is
omitted in this review for clarity (it's a 95k line diff).