Details

Reviewers

Group Reviewers

GCompris
Localization

Summary

The aim of this diff would be to add a po containing all the words to be translated in the lang activity (https://gcompris.net/incoming/lang/words.json) to ease the work for the translators.

Multiple questions:

Is it useful for translators to have it in a po format (in KDE translation repository) instead of a json file (which is only in GCompris repository)?
Which information would be needed to have in the translator comments? The images given in comment should help to understand the context, not sure if more is needed.
I'm really not used to awk, maybe there is some cleaner way to write what I did if someone is interested :).

Test Plan

The first idea is to fill the existing po with the words in locales already available (those in https://cgit.kde.org/gcompris.git/tree/src/activities/lang/resource) to not lose the existing work or duplicate the translators work.

Then, once the po files are ok, when there are updates, synchronize the po files with the json files (so translators would only need to update the po files, no more updating manually the json files).

Diff Detail

Lint

Lint Skipped

Unit

Unit Tests Skipped

jjazeix created this revision.Dec 22 2018, 5:25 PM

Restricted Application added a project: KDE Edu. · View Herald TranscriptDec 22 2018, 5:25 PM

Restricted Application added a subscriber: kde-edu. · View Herald Transcript

jjazeix requested review of this revision.Dec 22 2018, 5:25 PM

Will be commited next week if no complains, scripts to convert from json to po and opposite done.

I think you should be using StaticMessages.sh

This allows you to get the script that merges back to your files to be run by scripty every day so you don't forget it.

We don't have much documentation but maybe reading the kdeconnect file is enough to understand how to do it?

https://cgit.kde.org/kdeconnect-android.git/tree/StaticMessages.sh

Thanks, I'll take a look. Other similar script that can help understanding: https://cgit.kde.org/plasma-browser-integration.git/tree/StaticMessages.sh

script that process the messages: https://websvn.kde.org/trunk/l10n-kf5/scripts/process-static-messages.sh?view=markup

Yes, it is very useful to have it in the PO format instead of the JSON format.

The information seems sufficient. But could you split it over two lines instead, and add space after colon? I recommend:

#. Description: "alphabet"
#. Image: https://www.gcompris.net/incoming/lang/lang/words/alphabet.png

(Lokalize currently displays the two lines as a single-line string, but that’s a bug (and it’s not a big problem here): https://bugs.kde.org/show_bug.cgi?id=403142)

It’s not clear what the strings will be used for. Will the text be used even if there isn’t an audio file in the given language?

And if so, shouldn’t it be taken from content-en.json (if not empty) instead of from the file name (e.g. "seven" instead of U0037)? And if the English string in ‘content-en.json’ is missing, at least the underscore in the filename should be converted to a space (e.g. ‘air_horn’ → ‘air horn’).

This revision now requires changes to proceed.Jan 12 2019, 2:27 PM

In D17739#391965, @huftis wrote:

The information seems sufficient. But could you split it over two lines instead, and add space after colon? I recommend:

#. Description: "alphabet"
#. Image: https://www.gcompris.net/incoming/lang/lang/words/alphabet.png

I can change it to:
#. otherChapter / number / 10.ogg
#. https://gcompris.net/incoming/lang/words.html#ten
#: https://gcompris.net/incoming/lang/words.html#ten
msgid "ten"
msgstr ""

This way, the text to translate is really the description, not the sound file?

It’s not clear what the strings will be used for. Will the text be used even if there isn’t an audio file in the given language?

Yes, the text will be used even if there is no audio. If the word is not translated, the english word is not used at all in the category to learn. For example, in the colors, in red is not translated, we wll display all the words except this one.
The plan is to use StaticMessages.sh to fill back the content-$lang.json. There is an internal api to convert back from the audio file to the description, we don't display the ogg file.

And if so, shouldn’t it be taken from content-en.json (if not empty) instead of from the file name (e.g. "seven" instead of U0037)? And if the English string in ‘content-en.json’ is missing, at least the underscore in the filename should be converted to a space (e.g. ‘air_horn’ → ‘air horn’).

Generates po with lines looking like:
#. otherChapter / number / U0039.ogg
#. https://gcompris.net/incoming/lang/words.html#nine
#: https://gcompris.net/incoming/lang/words.html#nine
msgid "nine"
msgstr ""

#. otherChapter / action / scratch.ogg
#. https://gcompris.net/incoming/lang/words.html#scratch
#: https://gcompris.net/incoming/lang/words.html#scratch
msgid "to scratch"
msgstr ""

Pushing it in a few hours if no complains :). Note that there is a script to fill it for languages which already have translated it in the json files

The link in comment section in your example doesn’t work (and why are they duplicated, BTW?):

https://gcompris.net/incoming/lang/words.html#scratch

It looks like the web site uses the description ‘to scratch’, not the ID ‘scratch’ in the links.

The following link seems to work:

https://gcompris.net/incoming/lang/words.html#to%20scratch

(Though it really shouldn’t, since spaces aren’t allowed in ‘id’ or ‘name’ attributes in HTML.)

In StaticMessages.sh, the call to the Python script in import_po_files is commented out. Is this intentional?

src/StaticMessages.sh
22	The actual Python call is commented out. Is this intentional?

This revision now requires changes to proceed.Jan 19 2019, 12:27 PM

Also, since (AFAICS) the words are organized by section in the PO file (that’s good), perhaps you should link to

https://gcompris.net/incoming/lang/words_by_section.html
instead of to
https://gcompris.net/incoming/lang/words.html

Then the images on the Web page would be in the same order as in the PO file, which makes things easier for the translators.

A few suggested changes in the POT header.

src/activities/lang/resource/datasetToPo.py
69	This should be: https://bugs.kde.org/enter_bug.cgi?product=gcompris
72	This is typically set to Last-Translator: FULL NAME <EMAIL@ADDRESS>\n in POT files.
73	This is typically set to: Language-Team: LANGUAGE <kde-i18n-doc@kde.org>\n in KDE POT files.

Remove duplicate comment of the image links. Update the image link to the good one when there are spaces (we'll look after to remove the spaces).
Fix the header.
Use words_by_section.html page instead of words.html

src/StaticMessages.sh
22	yes, I first want to fill back the existing po from the json files before applying it (and to be sure it saves the files well in the good place)

missing \n

One final, minor change in the URLs is need to make them clickable.

src/activities/lang/resource/datasetToPo.py
84	Any spaces in the URL needs to be URL-encoded for the URL to be clickable. Bascially, just replace all spaces with `%20`.

This revision now requires changes to proceed.Jan 20 2019, 2:06 PM

Replace " " with "%20" in urls

pino added a subscriber: pino.Jan 20 2019, 2:13 PM

pino added inline comments.

src/activities/lang/resource/datasetToPo.py
84	Not only spaces, but any special character. Please use the urllib module to do the escaping properly.

Replace "%20" with " " when recreating the json file

pino added inline comments.Jan 20 2019, 2:22 PM

src/activities/lang/resource/datasetToPo.py
84	Please use the urllib module to do the escaping properly, instead of a manual replace.

In poToDataset.py, only translated strings should be included. Currently, if a translator translates ‘foo’ to ‘bar’, waits until the JSON file is regenerated, changes their mind and deletes or fuzzies the translation (“I don’t think ‘bar’ is the correct translation for ‘foo’ after all, but I’m not sure what is the correct translation yet”), the JSON file is stuck with ‘bar’ as the translation.

src/activities/lang/resource/poToDataset.py
50	This doesn’t delete old, untranslated/fuzzy entries. It should.

This revision now requires changes to proceed.Jan 20 2019, 2:26 PM

Use urllib for both encoding and decoding.
Remove translation if fuzzy or update it if updated.

Since the filename, e.g. alarmclock.ogg, is used as the key in the JSON file, I think it would be cleaner to use it as a ‘msgctxt’ in the PO file. That way, you don’t have to try to parse the comments to extract the keys when regenerating the JSON files. And it makes it possible to have more the one image with the same ‘msgid’ (homographs with different meaning, e.g. a verb and a noun). (I don’t think there’s currently any such strings, but there may be in the future.)

src/activities/lang/resource/datasetToPo.py
88	Consider using the ‘msgctxt’ field for storing the JSON keys.
src/activities/lang/resource/poToDataset.py
43	Consider using the ‘msgctxt’ field for storing the JSON keys.

In D17739#396924, @huftis wrote:

Since the filename, e.g. alarmclock.ogg, is used as the key in the JSON file, I think it would be cleaner to use it as a ‘msgctxt’ in the PO file. That way, you don’t have to try to parse the comments to extract the keys when regenerating the JSON files. And it makes it possible to have more the one image with the same ‘msgid’ (homographs with different meaning, e.g. a verb and a noun). (I don’t think there’s currently any such strings, but there may be in the future.)

I though about it at first and was afraid that the translator kept the .ogg extension in the translation. If it's safe, it will be easier. There is orange as color and fruit I think but orange-color.ogg is used for the color

The poToDataset.py script only seems to work if 1) there already *is* a JSON file and 2) the file contains an entry for the strings in the PO file. So someone needs to manually add the JSON file and keep the entries updated to reflect the original English JSON file. Wouldn’t it be easier to just write the JSON files based on the PO file? They should contain all the information needed to generate JSON files.

This revision now requires changes to proceed.Jan 20 2019, 2:49 PM

In D17739#396941, @jjazeix wrote:

In D17739#396924, @huftis wrote:

Since the filename, e.g. alarmclock.ogg, is used as the key in the JSON file, I think it would be cleaner to use it as a ‘msgctxt’ in the PO file. That way, you don’t have to try to parse the comments to extract the keys when regenerating the JSON files. And it makes it possible to have more the one image with the same ‘msgid’ (homographs with different meaning, e.g. a verb and a noun). (I don’t think there’s currently any such strings, but there may be in the future.)

I though about it at first and was afraid that the translator kept the .ogg extension in the translation. If it's safe, it will be easier. There is orange as color and fruit I think but orange-color.ogg is used for the color

If you put the ID in the msgctxt field, there won’t be a problem. It will look something like this:

msgctxt "orange-fruit.ogg"
msgid "orange"
msgstr "appelsin"

msgctxt "orange-colour.ogg"
msgid "orange"
msgstr "oransje"

In the PO editors, the translators will see the ‘msgctxt’ in a different pane than the one containing the original strings (msgids). There is no risk that they will translate the filename.

Use msgctxt to store the key of json file.
Write a new json instead of starting from an actual one

I have tested the scripts and found one bug in poToDataset.py. It also converts obsolete entries in the PO file into JSON entries.

But I have a question. Is it necessary to also output the empty (untranslated/fuzzy) entries? If not, you can fix the bug and simplify the JSON files at the same time by just using:

for entry in poFile.translated_entries():
    word = entry.msgctxt
    data[word] = entry.msgstr

Only set the translated values in the json file

I don’t know how StaticMessages.sh stuff works, so I’m not qualified to test that part. But I’ve tested the POT and JSON generator scripts, and they seem to work perfectly.

Thanks your work on this! It makes the translators’ work much easier.

This revision is now accepted and ready to land.Jan 20 2019, 3:28 PM

In D17739#396953, @huftis wrote:
I have tested the scripts and found one bug in poToDataset.py. It also converts obsolete entries in the PO file into JSON entries.

But I have a question. Is it necessary to also output the empty (untranslated/fuzzy) entries? If not, you can fix the bug and simplify the JSON files at the same time by just using:
for entry in poFile.translated_entries():
    word = entry.msgctxt
    data[word] = entry.msgstr

It will work (I tried by removing all numbers of the json file excepting 2) and

In D17739#396971, @huftis wrote:

I don’t know how StaticMessages.sh stuff works, so I’m not qualified to test that part. But I’ve tested the POT and JSON generator scripts, and they seem to work perfectly.

Thanks your work on this! It make the translators’ work much easier.

Thank you a lot for all the remarks! (Albert and Pino too :)). I'm pushing it and I'll check the logs tomorrow and merge the existing translations. If good, I'll uncomment the import

Pushed in https://cgit.kde.org/gcompris.git/commit/?id=e5fdcc7aa210d26f5154dd48fd6b1a26573a7cd1

		Path
A	M	src/StaticMessages.sh (24 lines)
M		src/activities/lang/resource/datasetToPo.py (71 lines)
A	M	src/activities/lang/resource/poToDataset.py (42 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	48019		Dec 22 2018, 5:07 PM	★	★
Diff 2	49555		Jan 15 2019, 6:41 PM	★	★
Diff 3	49919		Jan 20 2019, 9:58 AM	★	★
Diff 4	49922		Jan 20 2019, 10:08 AM	★	★
Diff 5	49933		Jan 20 2019, 2:13 PM	★	★
Diff 6	49934		Jan 20 2019, 2:14 PM	★	★
Diff 7	49936		Jan 20 2019, 2:37 PM	★	★
Diff 8	49939		Jan 20 2019, 3:00 PM	★	★
Diff 9	49941		Jan 20 2019, 3:20 PM	★	★

Add a po file for the list of words in GCompris
ClosedPublic
Actions

Details

Diff Detail

Revision Contents
Changeset List

Diff 49941

src/StaticMessages.sh

src/activities/lang/resource/datasetToPo.py

src/activities/lang/resource/poToDataset.py

Add a po file for the list of words in GComprisClosedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 49941

src/StaticMessages.sh

src/activities/lang/resource/datasetToPo.py

src/activities/lang/resource/poToDataset.py

Add a po file for the list of words in GCompris
ClosedPublic
Actions

Revision Contents
Changeset List