Evaluate the addition of a web-based translation system
Open, Needs TriagePublic

Description

This task is for the evaluation of a web-based translation system as discussed and proposed on several threads in the mailing list and a Phabricator ticket.

This task is thought to be a centralized point of information gathering and discussion for a web-based translation system. Thus hopefully the current fragmented state of discussions about it can be all summarized at one point.

Please feel free to edit this description with more relevant information and requirements. Ideally if there are different views on a point try to reach consenseus in the comments and afterwards add it to the description.

Why?

Having a web-based translation system makes contributions more streamlined (less tools) and easier for new contributors. Ideally it leads to a larger base of translators for KDE software.

Objections

  • Might not show the expected impact on contributors (1, 2)
  • Might be problematic with a large project (1, 2, 3)
  • Possible decrease of quality or more effort to ensure quality (1, 2, 3)

Requirements

  • Compatibility with an offline workflow (1, 2)
  • Integration to work with multiple branches (either by the tool itself or by some other mechanism like PO Summit) (1)

Possible solutions

This is a list of possible solutions. The order should try to reflect the suitability of them with the most suitable ones at the top of the list. Also note that it is preferred (probably also required) that they are open source.

  • Different approach, but has been suggested several times: Damned Lies (1, 2, 3)

Reference to an old summary of a previous approach

clel updated the task description. (Show Details)Jun 22 2020, 7:09 PM
clel updated the task description. (Show Details)Jun 22 2020, 7:25 PM
clel updated the task description. (Show Details)Jun 22 2020, 7:31 PM
clel updated the task description. (Show Details)Jun 22 2020, 7:48 PM
aacid added a subscriber: aacid.Jun 22 2020, 9:01 PM
subins2000 added a subscriber: subins2000.EditedAug 19 2020, 7:45 AM

I would like to mention the Malayalam team's usage of KDE localization with Weblate which worked out well (though there was an intermediary git repo) : https://phabricator.kde.org/T13514#237642

I came into the KDE Community last year as part of localizing KDE apps to Malayalam. As a beginner, it was difficult to start. I had previous experience with GNOME localization, and there, the process was to lock, download PO file from Damned Lies, and upload. As a beginner, that process was a little difficult cause the team was practically dead and had to contact the maintainer to review it. The existing maintainers are no more college students and is busy in their daily jobs, so review was difficult.

Coming to KDE, the process was more difficult. It was SVN, something that I haven't played with much. I asked the same question, why are they not using git ?! From the advice of the previous maintainer, I pinged the KDE mailing list, and got commit access. I wanted to make it more easier, because sharing PO files with people who wanted to localize was difficult and teaching them how to use Lokalize had other difficulties (I was not comfortable with Lokalize myself). I can see how the current setup can keep away potential new contributors.

So I decided to find a way to fix this, I found that there was Pootle, Pontoon, & Weblate. Thanks to SMC, I got a server to play with. Pontoon needed a lot of memory importing files from SVN and the server I had couldn't hold it, plus Pontoon was very integrated with Mozilla that it's difficult to adapt it to new needs (need a hard fork). Then I tried Pootle, it worked better and we did localization with it for a while. But, then I realized Pootle is still Python2 and it's not going to get any updates anymore.

And at last, I tried Weblate (I regret that I didn't try it first). TL;DR It worked well ! Instead of direct import from SVN, I made a intermediary git repo where the needed files are stored, and imported by Weblate. More on this here : https://github.com/subins2000/kde-weblate

With this Weblate setup, one could commit translations directly to SVN and Weblate would exist side-by-side.

From this experience, I'd say keep SVN, but also have Weblate or other online tooling together. I really love how one can download the needed file and commit it only in SVN. With git, you have to download the entire repo which would be a drag. No, I don't think moving to git would solve the problem of "newcomers can't easily do KDE localization", IMO Weblate/similar online tool is the best way to solve that.

Right now, Hindi localization in India is being done with SVN, Assamese localization is about to be setup completely with Weblate, and Malayalam localization exists on Weblate at https://kde.smc.org.in. I haven't been able to put in more time to localization present, but it is possible for Weblate & SVN to coexist together, perhaps even removing the intermediary git repo and do straight pull/push from SVN.

Also, Weblate has this feature of voting suggestions, if a translation suggestion get X votes, it'll be auto accepted. This would solve the problem of maintainer inactiveness in reviewing, provided there are enough localizers to do the vote.

I've made a doc of setting up KDE localization with Weblate here : https://github.com/subins2000/kde-weblate

Why I used the intermediary git repo :

  1. Weblate works good with git and I'm more familiar with git.
  2. I wanted to double verify before commiting to SVN. I was scared in giving Weblate direct access to SVN for push
  3. The server I had was poor. Importing the entire SVN repo needed more resources. So only the very needed files are there in the git intermediary

I believe it's also possible to remove the intermediary repo and use SVN directly. This would need an experiment to know for sure.

clel renamed this task from Evaluating the addition of a web-based translation system to Evaluate the addition of a web-based translation system.Aug 19 2020, 1:55 PM
clel updated the task description. (Show Details)
clel added a project: Localization.
clel added a comment.Aug 19 2020, 2:12 PM

Thanks @subins2000 for adding a report of your experience and also some documentation. I added this information to the task description.

I noticed that you are the GSoC 2019 student who wanted to evaluate and setup a web localization service for KDE. Your old proposal from back then is also linked at the buttom of this task's description. Although I don't know whether you have been accepted for GSoC back then, I am glad to see that you at least managed to come up with a nice solution for the Malayalam team. I wonder whether this solution or some refined version of it can be used for more or even the entire KDE localization process. This might need more powerful resources, which I assume would be available from the KDE organization, manpower to do the change plus coordination with the admins and finally some consenseus of contributors agreeing to do/accept this change. However, as I read it, the way you set it up is pretty conservative and not really messing with existing workflows. I am not sure though, since probably there are no classical workflows existing for your team. So there might be conflicts with other languages where translations happen both on SVN directly by uploading files and through weblate (possible conflict?). Maybe you can shed some light on this.

If this solution works with the current workflow (and possible with a workflow after migrating from SVN to Git, but this seems to be still in discussion), I think this would be a pretty nice thing offering a better workflow for new contributors and not impacting the workflow of existing ones.

That GSoC proposal didn't go through. My initial suggestion was to replace the SVN localization with an online tool, this would mean straight commit to SVN won't be allowed. But with further communication, learning and setup, I realized that this have cons too. So, I changed it from entire replacement, to side-by-side system.

However, as I read it, the way you set it up is pretty conservative and not really messing with existing workflows. I am not sure though, since probably there are no classical workflows existing for your team. So there might be conflicts with other languages where translations happen both on SVN directly by uploading files and through weblate (possible conflict?). Maybe you can shed some light on this.

Yes, I was scared of giving Weblate direct commit access and accidentally mess with SVN repo. Yes, there might be conflicts when both SVN direct commits and Weblate localization is used. I rectified these in the intermediary git repo. If it was straight SVN, the conflicts should be solved in the Weblate server I suppose. The common conflict that I got most of the time are these lines :

"POT-Creation-Date: 2020-05-15 02:44+0200\n"
"PO-Revision-Date: 2020-03-29 09:50+0000\n"
msgid "Your names" and its value

scripty changes them in its running and causes the conflict when commiting via Weblate. I think this can be fixed. It might be something I'm doing wrong 😅

In Weblate, there's Translation Propogation : A similar string in different components will be set the same value. This messes up Your names and Your emails strings, which should be different for every component. It did once and it was a pain to fix the POs later. I later solved this by disabling translation propogation.

The best way to avoid conflicts is to only let one : either Weblate or SVN direct commit . Let there be a Weblate instance and :

  1. Teams interested in only Weblate use it
  2. Prevent direct commit to specific app localization in SVN and use Weblate only, like Krita
clel added a comment.Aug 20 2020, 6:58 PM

Thanks for the insight. I also think there would be conflicts if somebody does a translation of a .po file, uploads it directly through SVN and at the same time translations for the same lines come from Weblate, correct?

Since you suggest to only use either Weblate or direct commit and ideally we want a web-based translation system that would mean no more direct commits. Is there an option to download a .po file, translate and later import it into Weblate? Also that might break other important workflows (I don't know).

At the risk of this being off-topic, but are those conflicts also a big problem because of the way SVN handles merges, or would git be equally bad at it?

Shall we discuss how the system is going to handle multiple translation branches (e.g. trunk/l10n-kf5, branches/stable/l10n-kf5, ...) ?

Here are the obvious options:

  1. Use a web-based translation system that natively supports multiple "branches" for .po files. Does anyone know which of the listed systems have such support?
  2. Enable PO Summit for all team, then wire the web system up to the .po files gathered (unified/merged) from all branches.
  3. ...anything else?
subins2000 added a comment.EditedAug 21 2020, 5:12 AM

I also think there would be conflicts if somebody does a translation of a .po file, uploads it directly through SVN and at the same time translations for the same lines come from Weblate, correct?

Yes, it would

Is there an option to download a .po file, translate and later import it into Weblate? Also that might break other important workflows (I don't know).

Yes, Weblate allows to download PO (or in other formats) and upload them. Upload is possible on individual components, it doesn't have bulk import now, but maybe in fiture : https://github.com/WeblateOrg/weblate/issues/4283


You can try this out here : https://kde.smc.org.in

KDEConnect Indicator is hosted on Weblate too : https://hosted.weblate.org/projects/indicator-kde-connect

but are those conflicts also a big problem because of the way SVN handles merges, or would git be equally bad at it?

@woltherav I believe git will still have the same problem.

how the system is going to handle multiple translation branches (e.g. trunk/l10n-kf5, branches/stable/l10n-kf5, ...) ?

@aspotashev Doesn't using summit mean there will be only one branch to maintain, and scripty will take care of merging them to other branches ? If so, Weblate will be fine.

I don't think Weblate has the branching support, the workflow there is a Project -> Component -> Languages
To get different branches, I guess different projects should be made: KDE trunk-l10n-kf5, stable-l10n-kf5 and so on...

clel updated the task description. (Show Details)Aug 21 2020, 4:10 PM
clel added a subscriber: ltoscano.Aug 21 2020, 4:16 PM

Shall we discuss how the system is going to handle multiple translation branches (e.g. trunk/l10n-kf5, branches/stable/l10n-kf5, ...) ?

Here are the obvious options:

  1. Use a web-based translation system that natively supports multiple "branches" for .po files. Does anyone know which of the listed systems have such support?
  2. Enable PO Summit for all team, then wire the web system up to the .po files gathered (unified/merged) from all branches.
  3. ...anything else?

Good point. I had a look and Weblate seems to claim to support different branches. Unfortunately I do not fully understand the technical side behind their solution and behind KDE's requirements. I added the requirement to the description and also the link to the Weblate docs about branches. Maybe you can have a look.


I also think there would be conflicts if somebody does a translation of a .po file, uploads it directly through SVN and at the same time translations for the same lines come from Weblate, correct?

Yes, it would

Is there an option to download a .po file, translate and later import it into Weblate? Also that might break other important workflows (I don't know).

Yes, Weblate allows to download PO (or in other formats) and upload them. Upload is possible on individual components, it doesn't have bulk import now, but maybe in fiture : https://github.com/WeblateOrg/weblate/issues/4283

Hm, so maybe there could be a new workflow only using Weblate to avoid merge conflicts, since it seems to be compatible with an offline workflow and then block direct editing access to the repo(s). Still there might be other workflows or automation that will be broken by doing this. Ping @ltoscano

pino added a subscriber: pino.Aug 21 2020, 4:33 PM

Shall we discuss how the system is going to handle multiple translation branches (e.g. trunk/l10n-kf5, branches/stable/l10n-kf5, ...) ?

Here are the obvious options:

  1. Use a web-based translation system that natively supports multiple "branches" for .po files. Does anyone know which of the listed systems have such support?
  2. Enable PO Summit for all team, then wire the web system up to the .po files gathered (unified/merged) from all branches.
  3. ...anything else?

1/2/3 branches is not a problem for Weblate, it's a simple matter of adding more components per branch.

The thing is: Weblate is designed to "own" translations, meaning it expects no changes in the upstream repository on the translations themselves. Conflicts can be resolved, but it is far from an easy job to do:
https://docs.weblate.org/en/latest/faq.html#how-to-fix-merge-conflicts-in-translations
considering we have more than 2000 templates only for messages, this hardly scales.

A possible idea I had is:

  • have scripty split in different phases, and first do only the message extraction, committing the templates
  • weblate will merge all the translations
  • have weblate (only in case it is an own hosted instance) commit directly, and push all the merged translations, similarly to what scripty does
  • then have the second phase of scripty proceed with merging of translations in desktop/appdata/etc files

Of course, this requires a) full switch of the l10n repositories to it b) huge refactoring our tooling c) something else I'm surely missing ...

yaron added a subscriber: yaron.Aug 26 2020, 11:32 AM
cblack added a subscriber: cblack.Feb 18 2022, 9:18 PM

Another data point: the toki pona team uses a Weblate instance working directly against the SVN. Works fine, other than the fiddling of the internal checkout Weblate uses required to prevent it from dying on how many files we have in its "scanning" phase.

yaron added a comment.Feb 20 2022, 7:24 PM

@cblack What will it take to simply join them?

I've done some tests with Fedora's Weblate for the Hebrew team, the SVN was too slow so we got a timeout message, because of that we had to setup some middle git repository (on Pagure) to support this operation.
Needless to say it required too much effort on my end so I really didn't like this solution.
With SVN I'm afraid it'll be less tight than what the Weblate git integration can offer.

@cblack What will it take to simply join them?

Sorry, I don't understand what you're trying to say here.

yaron added a comment.Feb 21 2022, 6:40 AM

Sorry, I don't understand what you're trying to say here.

Can I add the Hebrew translation the same way Toki Pona was provisioned?

You want to use the Weblate instance set up for the Toki Poka team for the Hebrew localisation?

yaron added a comment.Feb 21 2022, 8:05 AM

@cblack Yes, I'd love that.

Could we chat on a more real-time platform to hash out the details of that? My matrix is @pontaoski:kde.org and Telegram is https://t.me/pontaoski

aacid added a comment.Feb 21 2022, 8:03 PM

I'm going to warn you against that, you can't create a system to automatically commit things from other people into KDE servers without human interaction, that's basically giving out your user for anyone else to [ab]use, and that's obviously not allowed.

yaron added a comment.Feb 21 2022, 8:16 PM

@aacid That's maybe relevant to pootle, in Weblate there's an ACL and a string approval system.

Weblate can also create branches for the translation without merging to main/master, I'm not sure how it's managed in SVN.

I'm going to warn you against that, you can't create a system to automatically commit things from other people into KDE servers without human interaction, that's basically giving out your user for anyone else to [ab]use, and that's obviously not allowed.

As it is currently, the Weblate requires manual intervention from me before sending anything off to KDE servers, and I have an overview of all outgoing changes to look at before I push the button.

aacid added a comment.Feb 21 2022, 9:40 PM

That's probably acceptable but I am going to maybe wrongly assume you don't also speak Hebrew?

Well, this tangent is kinda moot now since some chatting & it's probably not going to work out having Hebrew team work on the instance I set up for the Toki Pona team.

yaron added a comment.Feb 23 2022, 7:17 PM

@cblack Thank you so much, it was a kind offer but I don't want to increase the amount of workload on you. Thank you!
@aacid We can try and make some automation for approved strings but I can't estimate the effort required for SVN.

yaron awarded a token.Feb 16 2023, 8:59 PM
emohr added a subscriber: emohr.Nov 14 2023, 4:42 PM