Migrate KDE translations to Git
Open, Needs TriagePublic

Tokens
"Doubloon" token, awarded by mormegil."Doubloon" token, awarded by yaron."Like" token, awarded by martonmiklos.
Assigned To
None
Authored By
aspotashev, Aug 17 2020

Description

Starting this task item in order to collect requirements and come up with a possible roadmap for translations' SVN-to-Git migration.

Background

SVN is currently used in KDE almost exclusively for translations into human languages, see most recently updated directories being l10n-kf5 and l10n-support:

There are a few problems with SVN and advantages of Git:

  1. SVN is harder to use for new contributors. SVN is less popular than Git (see e.g. this poll at a Russian online forum: https://www.linux.org.ru/polls/polls/15179371), therefore more perspective contributors need to learn it from scratch, while Git may often be already familiar to them.
  2. SVN requires additional Sysadmin manpower to support (1) the SVN server in addition to Gitlab [invent.kde.org], (2) websvn.kde.org, (3) pre-commit hooks, etc.
  3. Committing offline is not possible in SVN.
  4. Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)
  5. ...

Proposal Overview

I suggest that we migrate the translations into a set of Git repositories, one per language or per language variant.

Roadmap

Here is a possible sequence of steps to migration translations to Git:

  1. Migrate scripty code from SVN to Git - https://phabricator.kde.org/T4803
  2. Refactor/rewrite scripty in a way to make future changes easier and safer (testable). Make scripty easy to integrate with both SVN and Git in an SCM-agnostic way - this is going to be necessary for the transitional period when some translation teams will be beta testing Git support.
  3. Modify scripty to also generate a Git repository l10n/templates.git with the same contents as https://websvn.kde.org/trunk/l10n-kf5/templates/ (and sister directories - branches/stable/l10n-kf5/templates, etc)
  4. Modify scripty to also generate a Git repository l10n/x-test.git with the same contents as https://websvn.kde.org/trunk/l10n-kf5/x-test/ (and sister directories also named "x-test")
  5. Adopt or implement tooling to facilitate mass changes in all Git translations repositories. The tooling must work with both SVN and Git, the Git integration can be tested against x-test.git. I'm not sure about the requirements, @ltoscano probably knows the typical usage scenarios.
  6. Implement full Git integration in scripty. This means scripty should keep translations synchronized daily for any language that is stored in Git. Can be tested against x-test.git.
  7. Update release scripts (mainly https://invent.kde.org/sdk/releaseme) to support translations in Git as well as in SVN.
  8. [In the meantime,] work on PO Summit Git integration.
  9. Run a survey and pick 1-3 translation teams for Beta onboarding, these should be the teams already familiar with Git. If PO Summit integration is not ready yet at this point, only onboard teams that don't use PO Summit. Disable GitLab merge requests feature for l10n/[language].git repositories unless we have an explicit approval from particular language team's coordinators.
  10. Onboard into beta testing 1-3 translation teams using PO Summit.
  11. Update/create documentation for translators.
  12. Update l10n.kde.org backend to display statistics for translation teams using Git.
  13. Gradually migrate translations for all language teams (or do it in one go if it feels safe).

Alternative Roadmap

A different roadmap where we would do the full migration in one shot:

  1. Migrate scripty code from SVN to Git - https://phabricator.kde.org/T4803
  2. Implement the automation needed to reduce the manual renames and moves of the files (the underlying logic is mostly scm-agnostic)
  3. Automate POSummit (to called from scripty) to reduce the operations that needs to be executed daily by each translation team;
  4. Adapt the non-extracting part of scripty to also checkout from and commit to Git.
  5. Adopt or implement tooling to facilitate mass changes in all translations repositories which should be used as last resort if the automatic detection doesn't work. The tooling must work with both SVN and Git.
  6. Implement a web-based translation system for teams who need a proper review system and hopefully be used as replacement for l10n.kde.org.
  7. Update release scripts (releaseme and sysadmin/release-scripts) to support translations in Git.
  8. Freeze translations in SVN and migrate translations for all language teams into Git. At this point all the scripts should be able to work with git as well.
  9. Update any remaining system (for example l10n.kde.org if it has not been replaced yet).
There are a very large number of changes, so older changes are hidden. Show Older Changes
ltoscano updated the task description. (Show Details)Aug 17 2020, 10:56 PM

I'm not sure what you mean here, can you please elaborate? The tooling to simplify moving of translations between directories is part of step #4 of the roadmap. Something like super-duper-tool mv kscreenshot/kscreenshot.po spectacle/spectable.po would create a commit in each of the Git translations repositories and git push all of them, possibly retrying this operation in a case of a merge conflict.

I don't want to have a tool to push things in all repositories first without reducing the number of occurrences when this can happen. Also, summit should get that support first.

I still don't get what you mean exactly. Could you please give an example where a naively implemented tooling may do the wrong thing?

We can actually create such tool for mass moving just for SVN and let you pilot it for a while, to see if it cover most use cases :-)

  • things may break in the middle of the process, potentially causing bad experience for developers

Keeping two systems like that is going to be a big pain with the current code. Unless we add "rewrite scripty" (which is something that should happen anyway) as a requirement for this.

Added "rewrite scripty" into the roadmap, see the updated description :-)
I've also added an alternative roadmap for the one-shot approach.

  • we won't get intermediate feedback from the translation teams. We may miss some of the requirements until someone points them out after the migration is done. This is where beta testing should help - we can listen to the teams who start using Git first and make changes to our plan.

Does it mean we can roll-back to svn if some team says so?

Generally, no. However

  • If things get very broken for a long time (weeks) and the team is blocked being unable to translate, we can roll them back to SVN temporarily to buy some more time to fix stuff,
  • The beta testing is supposed to help us find gaps in our integrations (e.g. imagine the migration might introduce some stealthy bug only affecting documentation translations), other feedback may be helpful to adjust the way we work with Git (e.g. multiple Git branches vs directories l10n-kf5, ... in one branch).

Teams will use git just like svn: checkout, change things, commit, push.
Teams who want to use revision will have a web interface like weblate, which should go up before this happens.

OTOH Weblate can probably integrate with both SVN and Git, so it shouldn't be a hard requirement to get Weblate integrated earlier than Git migration or the other way around. Or is it a hard requirement?

I strongly believe that people who need easy review needs something like weblate. All the others don't care whether it's direct git pushing or svn pushing.

Well, the main problem (in Krita) that newcomers are frightened away from doing translations because of too complicated process. Any solution that would make it simpler would work.

So we are basically in agreement, but not on having a git-based translation system directly exposed to the newcomers. That would make just things more complicated.

PS
Though SVN was actually the reason why my students didn't want to do the translations. It is too outdated and noone knows how to use it anymore.

That's sad, but I have to say that no one want to use it, because it's documented, and the basic commands maps directly to git commands (clone/checkout, commit+push/commit - even one less).
This does *not* mean I want to keep it, but I feel there is a bit too much bad advertisement.

Yes, typical user point of view, i don't like it, hence obviously noone likes it and it's the most important thing in the world to fix.

If your excuse for not doing something is that you don't want to copy and paste 2 svn commands, SVN is not the problem, it's just that you didn't want to do it.

I strongly believe that people who need easy review needs something like weblate. All the others don't care whether it's direct git pushing or svn pushing.
I also think that the web solution have an higher priority than the move of the translations to git.

I agree. One note about migrating to git however would be that kdesvn is already pretty awesome and doesn't even require those 2 svn commands. With gitlab, locally the only comparatively easy-to-use tool for committing is git-cola.

nalvarez added a comment.EditedAug 18 2020, 2:56 AM

FYI, I tried to convert Spanish translations to Git, only including /trunk/l10n-kde4/es/messages/ and /trunk/l10n-kf5/es/messages/ (no docs, no docmessages, no /branches/stable/, etc) and the result was 150MB.

(initially svn2git made an unoptimized 6GB monster; after a 'git gc --aggressive' that ate 9GB of RAM, it reduced the repo to 150MB)

Let me know if you want me to test something else (include docs? try another language?).

pino added a comment.Aug 18 2020, 7:30 AM

Please keep the eventual integration of Weblate (or any other web-based translation system) out of this "migrate to git" roadmap. Adding a web-based translation system requires additional complexity and even workflow changes that would make doing this conversion a even bigger and risky task than what it already is.

And no, I'm not against web-based translation system.

pino added a comment.Aug 18 2020, 7:35 AM

Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)

@aspotashev can you please explain why these distros care about the layout of our translations? When packaging upstream releases they ought to use translations as available in release tarballs, not pick random files from upstream translation trees.

rempt added a subscriber: rempt.EditedAug 18 2020, 7:52 AM

As I've said before, and will keep saying, if the way translations are handled in KDE changes, the change should be to include the po files in the source repositories, for the following reasons:

  • Developers will always build with translations
  • Since we're using gitlab now, that would making a release is as simple as creating a tag, i.e., we would be using gitlab as it is meant to be used
  • No chance of accidentally packaging the unstable translations with a release from the stable branch
  • nightly builds will include translations
In T13514#237629, @pino wrote:

Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)

@aspotashev can you please explain why these distros care about the layout of our translations? When packaging upstream releases they ought to use translations as available in release tarballs, not pick random files from upstream translation trees.

Our [upstream] translation team has limited capacity and we are unable to review new translations submitted by the distro on a timely manner. Therefore they want to use their own versions of translations when they are not upstreamed yet. They do it by injecting their custom translation files by the distro-specific packaging scripts.

Git should make it easier for them to merge upstreamed translations back into their custom tree when both repos will be in Git.

huftis added a subscriber: huftis.Aug 18 2020, 5:35 PM

As I've said before, and will keep saying, if the way translations are handled in KDE changes, the change should be to include the po files in the source repositories, for the following reasons:

  • Developers will always build with translations
  • Since we're using gitlab now, that would making a release is as simple as creating a tag, i.e., we would be using gitlab as it is meant to be used
  • No chance of accidentally packaging the unstable translations with a release from the stable branch
  • nightly builds will include translations

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).

But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

May we have the best of both worlds? That is, have a central repository for translations, for use by the translators, and automatic copying of the translations into each application’s repository by scripty. Basically the same thing that happens with translations in .desktop files. They are translated centrally, but any updates to the .po files are merged into the .desktop files in each application’s repository by scripty each night. Having a similar thing be done with the .po files would be nice. Plain .po files would be copied directly, while other formats (e.g., .ts files used by some Qt applications) would be converted from the .po files.

And to avoid anyone manually editing the ‘.po’ files in a an applications repository, perhaps a pre-commit hook could be added so that only scripty is allowed to commit changes to the files.

FYI, I tried to convert Spanish translations to Git, only including /trunk/l10n-kde4/es/messages/ and /trunk/l10n-kf5/es/messages/ (no docs, no docmessages, no /branches/stable/, etc) and the result was 150MB.

(initially svn2git made an unoptimized 6GB monster; after a 'git gc --aggressive' that ate 9GB of RAM, it reduced the repo to 150MB)

Well, 150 MB is quite (and for me surprisingly) small, so that shouldn’t be much of a problem for the translators. But a giant Git repository with all the languages will be. So please consider using submodules, one for each language.

BTW, LibreOffice uses a ‘giant Git repository’ for their (read-only version of the) translations. And it’s painfully slow to run git pull on it, especially on a non-SDD disk on older hardware. Try it yourself: git clone https://github.com/LibreOffice/translations.git

Additional comment (by me) from the mailing list:

I think moving all teams to summit is a nice idea. It’s easier for teams to deal with one branch than with several different ones (l10n-kf5, l10n-kf5-plasma-lts, l10n-kde4), all containing different versions of basically the same translation files.

Speaking as the coordinator for three teams that use the summit workflow, one thing I like is getting to decide when to merge the translations files with the templates. Since I do the merge when noone is doing any translations, this obviates the need for being able to handle merge conflicts (which are much harder to handle with Git than with SVN, BTW).

The teams are also using some custom merge settings (https://websvn.kde.org/trunk/l10n-support/scripts/summit_helpers_NO.py?view=log). It would be a shame if such team-specific customisations are lost.

I came into the KDE Community last year as part of localizing KDE apps to Malayalam. As a beginner, it was difficult to start. I had previous experience with GNOME localization, and there, the process was to lock, download PO file from Damned Lies, and upload. As a beginner, that process was a little difficult cause the team was practically dead and had to contact the maintainer to review it. The existing maintainers are no more college students and is busy in their daily jobs, so review was difficult.

Coming to KDE, the process was more difficult. It was SVN, something that I haven't played with much. I asked the same question, why are they not using git ?! From the advice of the previous maintainer, I pinged the KDE mailing list, and got commit access. I wanted to make it more easier, because sharing PO files with people who wanted to localize was difficult and teaching them how to use Lokalize had other difficulties (I was not comfortable with Lokalize myself). I can see how the current setup can keep away potential new contributors.

So I decided to find a way to fix this, I found that there was Pootle, Pontoon, & Weblate. Thanks to SMC, I got a server to play with. Pontoon needed a lot of memory importing files from SVN and the server I had couldn't hold it, plus Pontoon was very integrated with Mozilla that it's difficult to adapt it to new needs (need a hard fork). Then I tried Pootle, it worked better and we did localization with it for a while. But, then I realized Pootle is still Python2 and it's not going to get any updates anymore.

And at last, I tried Weblate (I regret that I didn't try it first). TL;DR It worked well ! Instead of direct import from SVN, I made a intermediary git repo where the needed files are stored, and imported by Weblate. More on this here : https://github.com/subins2000/kde-weblate

With this Weblate setup, one could commit translations directly to SVN and Weblate would exist side-by-side.

From this experience, I'd say keep SVN, but also have Weblate or other online tooling together. I really love how one can download the needed file and commit it only in SVN. With git, you have to download the entire repo which would be a drag. No, I don't think moving to git would solve the problem of "newcomers can't easily do KDE localization", IMO Weblate/similar online tool is the best way to solve that.

Right now, Hindi localization in India is being done with SVN, Assamese localization is about to be setup completely with Weblate, and Malayalam localization exists on Weblate at https://kde.smc.org.in. I haven't been able to put in more time to localization present, but it is possible for Weblate & SVN to coexist together, perhaps even removing the intermediary git repo and do straight pull/push from SVN.

Also, Weblate has this feature of voting suggestions, if a translation suggestion get X votes, it'll be auto accepted. This would solve the problem of maintainer inactiveness in reviewing, provided there are enough localizers to do the vote.

pino added a comment.Aug 18 2020, 7:15 PM
In T13514#237629, @pino wrote:

Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)

@aspotashev can you please explain why these distros care about the layout of our translations? When packaging upstream releases they ought to use translations as available in release tarballs, not pick random files from upstream translation trees.

Our [upstream] translation team has limited capacity and we are unable to review new translations submitted by the distro on a timely manner. Therefore they want to use their own versions of translations when they are not upstreamed yet. They do it by injecting their custom translation files by the distro-specific packaging scripts.

Git should make it easier for them to merge upstreamed translations back into their custom tree when both repos will be in Git.

This is not a priority, definitely. Especially when there is no public VCS to see what's the workflow...

pino added a comment.Aug 18 2020, 7:18 PM

But a giant Git repository with all the languages will be. So please consider using submodules, one for each language.

No, just a single repository for each language.

A "l10n" repository with submodules will not really work: git submodules are not like svn externals, and they point to a specific revision and not to a branch. Considering how often files in the repository of each language will change (at least once every day), it would mean constantly updating the submodules in the "l10n" repository... no please.

pino added a comment.Aug 18 2020, 7:19 PM

@subins2000 this ticket is NOT about a web-based translation tool, there is a separate ticket for that. Please keep the topic of this focused ONLY on the svn -> git migration.

FYI, I tried to convert Spanish translations to Git, only including /trunk/l10n-kde4/es/messages/ and /trunk/l10n-kf5/es/messages/ (no docs, no docmessages, no /branches/stable/, etc) and the result was 150MB.

I now tried /trunk/l10n-kde4/es/ and /trunk/l10n-kf5/es/ (so it also includes data, docmessages, docs): 245MB.

subins2000 added a comment.EditedAug 19 2020, 5:37 AM

@pino The goal of doing the git migration is to solve "SVN is harder to use for new contributors". I meant this problem will be better solved by an online tool than a migration to git.

From reading svnvsgit.com, I see SVN's better for large scale repos. Localization repo will always get bigger and bigger. Having to download a large git repo to do localization will still be a hindrance for new contributors, and then there's the prerequisite to know git. I was of the same opinion, why KDE is using SVN when I started out, but I understand now, it's for the right reasons.

pino added a comment.Aug 19 2020, 5:57 AM

@pino The goal of doing the git migration is to solve "SVN is harder to use for new contributors". I meant this problem will be better solved by an online tool than a migration to git.

Yes, and this requirements is mostly marketing. The real goal of this ticket is "switch from svn to git", that's it. Again, there is a different ticket for web-based translation tools, which will need a totally separate discussion, as they bring different requirements, different challenges, and different decisions to be made.

Erm, maybe it is worth it to actually link to this mythical web-based translation task? Because it is not inside the localization team project.

The mythical web-based translation task is T11070, plus the subtasks of it (especially T13311). Because this was proposed as a community goal, it initially wasn't tied to the Localization project and later on one forgot to change that.

woltherav added a comment.EditedAug 19 2020, 7:37 AM

Thanks! Maybe we should make this a subtask of that as well?

EDIT: especially because we're literally having duplicate conversations in both tasks :D

[...] to include the po files in the source repositories, for the following reasons:

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).
But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

Can it be solved with git-submodules? It looks like submodules were invented exactly for this purpose, weren't they?

aacid added a comment.Aug 19 2020, 7:49 AM

TBH I'm mostly ignoring this task, but i'm at least going to answer your initial points

"SVN is harder to use for new contributors"
debatable, I'd go with false.

"SVN requires additional Sysadmin manpower to support "
true

"Committing offline is not possible in SVN."
I honestly don't see how this matters, you can't push with git either.

"Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)"
Let me rephrase this for you: "We're doing things wrong, can you please change your upstream behaviour so it's less painful for us to do things wrong?"

aacid added a comment.Aug 19 2020, 7:51 AM

[...] to include the po files in the source repositories, for the following reasons:

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).
But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

Can it be solved with git-submodules? It looks like submodules were invented exactly for this purpose, weren't they?

No, they were not invented exactly for this purpose

In T13514#237645, @pino wrote:

A "l10n" repository with submodules will not really work: git submodules are not like svn externals, and they point to a specific revision and not to a branch. Considering how often files in the repository of each language will change (at least once every day), it would mean constantly updating the submodules in the "l10n" repository... no please.

As far as I can tell newer git versions can track entire branches (though I haven't tested that myself): https://www.activestate.com/blog/getting-git-submodule-track-branch/

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).

And it follows the most common practice across the free software world, too, so there's familiarity for contributors, too.

But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

May we have the best of both worlds? That is, have a central repository for translations, for use by the translators, and automatic copying of the translations into each application’s repository by scripty. Basically the same thing that happens with translations in .desktop files. They are translated centrally, but any updates to the .po files are merged into the .desktop files in each application’s repository by scripty each night. Having a similar thing be done with the .po files would be nice. Plain .po files would be copied directly, while other formats (e.g., .ts files used by some Qt applications) would be converted from the .po files.

And to avoid anyone manually editing the ‘.po’ files in a an applications repository, perhaps a pre-commit hook could be added so that only scripty is allowed to commit changes to the files.

Well, that would be a huge improvement for sure.

clel added a subscriber: clel.Aug 19 2020, 1:46 PM

Thanks! Maybe we should make this a subtask of that as well?

EDIT: especially because we're literally having duplicate conversations in both tasks :D

Done. Plus I added the localization project to that task. If anybody objects, feel free to undo this operations.

aacid added a comment.Aug 19 2020, 3:25 PM

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).

And it follows the most common practice across the free software world, too, so there's familiarity for contributors, too.

But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

May we have the best of both worlds? That is, have a central repository for translations, for use by the translators, and automatic copying of the translations into each application’s repository by scripty. Basically the same thing that happens with translations in .desktop files. They are translated centrally, but any updates to the .po files are merged into the .desktop files in each application’s repository by scripty each night. Having a similar thing be done with the .po files would be nice. Plain .po files would be copied directly, while other formats (e.g., .ts files used by some Qt applications) would be converted from the .po files.

And to avoid anyone manually editing the ‘.po’ files in a an applications repository, perhaps a pre-commit hook could be added so that only scripty is allowed to commit changes to the files.

Well, that would be a huge improvement for sure.

Just to show the other side of the argument, i've had lots of developers asking for scripty not to commit to the repo the .desktop translations back to the repos since it creates merge conflicts for them.

rempt added a comment.Aug 19 2020, 3:30 PM

Just to show the other side of the argument, i've had lots of developers asking for scripty not to commit to the repo the .desktop translations back to the repos since it creates merge conflicts for them.

That is not "another side of the argument". It's irrelevant, because .desktop files are edited by developers, po files would only be copied in.

aacid added a comment.Aug 19 2020, 3:39 PM

Just to show the other side of the argument, i've had lots of developers asking for scripty not to commit to the repo the .desktop translations back to the repos since it creates merge conflicts for them.

That is not "another side of the argument". It's irrelevant, because .desktop files are edited by developers, po files would only be copied in.

No, it's not irrelevant. It doesn't matter that .po are only copied, if the translation in stable branch and the one in development branch diverge (which they eventually will), you'll get merge conflicts (when merging stable to development)

rempt added a comment.Aug 19 2020, 4:33 PM

No, it's not irrelevant. It doesn't matter that .po are only copied, if the translation in stable branch and the one in development branch diverge (which they eventually will), you'll get merge conflicts (when merging stable to development)

Yes, it is irrelevant. I've never seen anyone merge an entire stable branch in one merge commit into the unstable branch; normally, people cherry-pick patches. Just don't cherry-pick the po files.

In any case, the advantages of having a po folder in the project git repo outweighs all of that to me, so if you are so intent on blocking this with irrelevant arguments, well, I'll do it myself.

No, it's not irrelevant. It doesn't matter that .po are only copied, if the translation in stable branch and the one in development branch diverge (which they eventually will), you'll get merge conflicts (when merging stable to development)

Yes, it is irrelevant. I've never seen anyone merge an entire stable branch in one merge commit into the unstable branch; normally, people cherry-pick patches. Just don't cherry-pick the po files.

In any case, the advantages of having a po folder in the project git repo outweighs all of that to me, so if you are so intent on blocking this with irrelevant arguments, well, I'll do it myself.

Boud, this is not the proper way to move this forward. Nothing in what Albert wrote is about not doing this.

The other issue brought up (not touch desktop/json/appdata files) is relevant but a possible solution is to disable the injection, copy all po files (including the ones for desktop/json/appdata) and inject them at build time. This is something that can be done, but we need first to work on the infrastructure to do this.

rempt added a comment.Aug 19 2020, 5:16 PM

Boud, this is not the proper way to move this forward. Nothing in what Albert wrote is about not doing this.

Um, yes, it is? Why else come up with "an other side of the argument"? In any case, problems with a practice we already support can hardly be relevant in discussing a practice we don't support yet?

The other issue brought up (not touch desktop/json/appdata files) is relevant but a possible solution is to disable the injection, copy all po files (including the ones for desktop/json/appdata) and inject them at build time. This is something that can be done, but we need first to work on the infrastructure to do this.

We've been there: there is cmake support to download the translations at build time, isn't there?

But that just isn't good enough.

We've moved to gitlab; we should be making releases the gitlab way. Make a tag, have a release.

In any case, this is a bit outside the original proposal, which I think is misguided, because I think our infrastructure should work like the rest of the free software world.

But I am okay with getting the translations pushed to my repo every night (or doing that myself -- yes, I guess that will be another way krita is slightly different from the frameworks community or the plasma community, but KDE is a community of communities these days.)

I'm not okay with the status quo, though. It's something that needs fixing ASAP because not being able to use tags to make releases the gitlab way, not having translations in nightly builds, not making it possible for developers to check translations while developing -- it's just not good enough.

If it's in principle okay to offer KDE projects the option to opt-in to having their git repo's po folder updated by a nightly push with the po files being locked for pushes by anything else than the pushing script (doesn't have to be scripty), I would be more than happy to fund that work.

dkazakov added a comment.EditedAug 19 2020, 6:27 PM

I don't really understand why cannot we use submodules for that? It looks like a submodule can track external branches. Why cannot we just add a submodule into each KDE project that fetches translations from an external repo by tracking some specific branch?

It would solve all the listed problems:

  1. Developers who need translations would be able to fetch them with a single git command
  2. Developers who don't need them would just skip the step of initializing submodiles
  3. Translators will have their own superprojects, which would include all the necessary translations as submodules.
  4. When doing a release, the submodule would be frozen to a specific commit in the translations repository, so the "just do a tag for a release" will work.

Why can't this solution work?

No, they were not invented exactly for this purpose

@aacid, I'm sorry, but your comment is just useless. Could you give a bit more objective insight why we couldn't use git-submodules for this purpose?

pino added a comment.Aug 19 2020, 6:57 PM

I don't really understand why cannot we use submodules for that? It looks like a submodule can track external branches. Why cannot we just add a submodule into each KDE project that fetches translations from an external repo by tracking some specific branch?

Because there will not be a single git repository with all the languages, ie equivalent to the current /trunk/l10n-kf5 and alike. See the comments done by Nicholas about the size of a couple of languages, each is hundreds of MB. A single repository would be at least 1/2 GB (if not even more), which is definitely not acceptable for the majority of users (which are translators).

Even if you add externals for all the repositories, the result for you does not change: you will download all of them as part of your repository, just to get a 1/2 translation files you are interested into.

So, no.

@rempt while I understand your point of view, this ticket is not about changing the workflow, which is way more than "move translations to the repositories". We have translated documentation to handle, we have translation scripts, we have data files. Sure, krita does not use any of them, but other applications do, and in order to switch the workflow for everybody the workflow must work for everybody.

Everybody: this ticket is still about "switching from svn to git", because we cannot keep the SVN server forever. This change is already big as it is, and alone it will require various changes, and potentially even rewriting part of the tooling used. Please, please, please, do not mix different issues here. If you have ideas/issues/etc for different parts of the translation toolchain, please let's track those in separate tickets. Thanks.

In T13514#237770, @pino wrote:

Even if you add externals for all the repositories, the result for you does not change: you will download all of them as part of your repository, just to get a 1/2 translation files you are interested into.

Well, we can discuss the layout. I don't think that "one repo - one application" will make much download amount for the application translators. We can also split it in "one repo - one language for one application" manner. It might be too granular, but it might work and save time for people who work on all applications at the same time.

And noone forbids the users from using --depth 1. @aacid insists that users are capable of "copy and paste 2 svn commands", so I don't think they would fail to add --depth 1 if it is written in the official manual.

pino added a comment.Aug 19 2020, 7:41 PM

Well, we can discuss the layout. I don't think that "one repo - one application" will make much download amount for the application translators. We can also split it in "one repo - one language for one application" manner. It might be too granular, but it might work and save time for people who work on all applications at the same time.

Sorry no, almost 600 repositories per language is not an option. Considering that we have 100+ languages (even if not all of them actively worked on), this would mean up to 60000 repositories. This also would mean creating 100 new repositories every time a new repository with code/messages is added. No, no, really, no.

Instead of suggesting solutions, please describe your requirements, or in general what you would like to see.

In T13514#237774, @pino wrote:

Instead of suggesting solutions, please describe your requirements, or in general what you would like to see.

Well, there are two requirements:

  1. When releasing, setting a tag in the git repository should be enough to make a release. The tarball should be created automatically by gitlab's "releases" feature. Right now the scripts for making tarballs out of SVN break regularly, so every release we should fix it in one or another way. And these scripts are not fool-proof. We have generated and published incorrect tarballs several times.
  1. The developers should have an easy way to build/install translations. Preferably, these translations should be synced with the current branch/commit (I often switch between master and krita/4.3 branches).

I've also added a workflow requirement into a different task as you asked, but got a weird reply to it: https://phabricator.kde.org/T11070#237775

ltoscano added a comment.EditedAug 19 2020, 7:58 PM
In T13514#237774, @pino wrote:

Instead of suggesting solutions, please describe your requirements, or in general what you would like to see.

Well, there are two requirements:

  1. When releasing, setting a tag in the git repository should be enough to make a release. The tarball should be created automatically by gitlab's "releases" feature. Right now the scripts for making tarballs out of SVN break regularly, so every release we should fix it in one or another way. And these scripts are not fool-proof. We have generated and published incorrect tarballs several times.
  2. The developers should have an easy way to build/install translations. Preferably, these translations should be synced with the current branch/commit (I often switch between master and krita/4.3 branches).

    I've also added a workflow requirement into a different task as you asked, but got a weird reply to it: https://phabricator.kde.org/T11070#237775

I know it may seems weird that I say this again, but this point is really out of scope for this specific task.
It is tracked by T13519

I know it may seems weird that I say this again, but this point is really out of scope for this specific task.
It is tracked by T13519

Well, this task talks about "injection" as about decided fact. But I tried to understand why git and gitlab themselves cannot be used for that.

I know it may seems weird that I say this again, but this point is really out of scope for this specific task.
It is tracked by T13519

Well, this task talks about "injection" as about decided fact. But I tried to understand why git and gitlab themselves cannot be used for that.

Your own requirement talks about injections/syncing ("these translations should be synced with the current branch/commit"), and that task is about that. It also makes it easier to release, which is the other requirement you wrote above.

zerg added a subscriber: zerg.Aug 20 2020, 1:29 PM
clel added a comment.Aug 20 2020, 7:29 PM

Um, yes, it is? Why else come up with "an other side of the argument"? In any case, problems with a practice we already support can hardly be relevant in discussing a practice we don't support yet?

From reading the conversation, I don't think he just "came up" with this. Also please don't assume bad reason behind people's actions, as this is basically the opposite of what the community guidelines suggest (assume good faith).

That is not "another side of the argument". It's irrelevant, because .desktop files are edited by developers, po files would only be copied in.

The discussion he quoted was:

May we have the best of both worlds? That is, have a central repository for translations, for use by the translators, and automatic copying of the translations into each application’s repository by scripty. Basically the same thing that happens with translations in .desktop files. They are translated centrally, but any updates to the .po files are merged into the .desktop files in each application’s repository by scripty each night. Having a similar thing be done with the .po files would be nice. Plain .po files would be copied directly, while other formats (e.g., .ts files used by some Qt applications) would be converted from the .po files.

And to avoid anyone manually editing the ‘.po’ files in a an applications repository, perhaps a pre-commit hook could be added so that only scripty is allowed to commit changes to the files.

Well, that would be a huge improvement for sure.

Just to show the other side of the argument, i've had lots of developers asking for scripty not to commit to the repo the .desktop translations back to the repos since it creates merge conflicts for them.

There it says that "any updates to the .po files are merged into the .desktop files in each application’s repository by scripty each night". So I think it is reasonable this might lead to merge conflicts. Also, po files would not "only be copied in", but in fact merged into the .desktop files (not sure if you meant that in your comment above). Still you would need to explain further, why you think that argument would be "irrelevant". At least I don't get your argument here. But I am not enough into this to judje from a technical perspective.


In T13514#237774, @pino wrote:

Well, we can discuss the layout. I don't think that "one repo - one application" will make much download amount for the application translators. We can also split it in "one repo - one language for one application" manner. It might be too granular, but it might work and save time for people who work on all applications at the same time.

Sorry no, almost 600 repositories per language is not an option. Considering that we have 100+ languages (even if not all of them actively worked on), this would mean up to 60000 repositories. This also would mean creating 100 new repositories every time a new repository with code/messages is added. No, no, really, no.

Instead of suggesting solutions, please describe your requirements, or in general what you would like to see.

This has already been suggested in some way, I guess (see below), but if it is already considered to have .po files directly in each project's repos (and as far as I understand that is pretty useful actually), why do we have one central repository for all translations in the first place where everything needs to be copied from to the projects' repos?

As I've said before, and will keep saying, if the way translations are handled in KDE changes, the change should be to include the po files in the source repositories, for the following reasons:

  • Developers will always build with translations
  • Since we're using gitlab now, that would making a release is as simple as creating a tag, i.e., we would be using gitlab as it is meant to be used
  • No chance of accidentally packaging the unstable translations with a release from the stable branch
  • nightly builds will include translations

This will make things easier for developers, packagers and user’s who like to compile applications themselves (instead of using packages, or just for testing a new feature or a bug fix).

But having a non-central location of the PO files will make things much harder for the translators (for various reasons that I won’t get into now).

Is there some other reference where it is described, why this will make things "much harder for the translators"? Basically you would in fact have one central location for all files, but per project. Ideally those could get summarized at a different place like Weblate maybe, so there is still a fast overview of all projects and languages needing attention for example. Not sure whether that is supported by Weblate, though.

Is there some other reference where it is described, why this will make things "much harder for the translators"? Basically you would in fact have one central location for all files, but per project. Ideally those could get summarized at a different place like Weblate maybe, so there is still a fast overview of all projects and languages needing attention for example. Not sure whether that is supported by Weblate, though.

Not everyone will use weblate and we would have tons of repositories. It is not a path

tl;dr a central source of truth for translations is a requirement, requested by translators.

clel added a comment.Aug 21 2020, 4:32 PM

Is there some other reference where it is described, why this will make things "much harder for the translators"? Basically you would in fact have one central location for all files, but per project. Ideally those could get summarized at a different place like Weblate maybe, so there is still a fast overview of all projects and languages needing attention for example. Not sure whether that is supported by Weblate, though.

Not everyone will use weblate and we would have tons of repositories. It is not a path

The amount of repositories would not grow. The translations would just be incorporated and managed in the existing repositories for each project. The amount is also just as big as the already existing amount of projects. Can you elaborate what would be against using Weblate to manage those repositories? Maybe you can do that specifically in the task about evaluating an online translation tool to avoid duplication.

tl;dr a central source of truth for translations is a requirement, requested by translators.

The source of truth would be there for every project individually with the option to have it all in one place on a platform like Weblate (I just had a look and they seem to somewhat support that, at least you can add several different projects/components with independent repositories: https://docs.weblate.org/en/latest/admin/projects.html) for example. Can you also elaborate why translators need such central source with all projects basically in one repository? I think it is pretty important to understand the requirements. Feel free to link to already existing comments or resources about this.

In T13514#237894, @clel wrote:

Is there some other reference where it is described, why this will make things "much harder for the translators"? Basically you would in fact have one central location for all files, but per project. Ideally those could get summarized at a different place like Weblate maybe, so there is still a fast overview of all projects and languages needing attention for example. Not sure whether that is supported by Weblate, though.

Not everyone will use weblate and we would have tons of repositories. It is not a path

The amount of repositories would not grow. The translations would just be incorporated and managed in the existing repositories for each project. The amount is also just as big as the already existing amount of projects. Can you elaborate what would be against using Weblate to manage those repositories? Maybe you can do that specifically in the task about evaluating an online translation tool to avoid duplication.

Weblate (as any online translation system) is still deadly slow compared to offline tools. On Hosted Weblate and Fedora's Weblate I witnessed constant git merge conflicts every week which should be resolved manually. This is a total nightmare for the KDE range of repositories number. Actually, I will have to prioritize some KDE translations because it is unrealistic to work with all of them through web interface (only ~15 minutes a day just to upload the translations, not saying about translating).

Weblate, Pootle, Wordbee, Transifex, Rosetta, etc. all tailored to leave several translations in a week or for *huge* teams.

tl;dr a central source of truth for translations is a requirement, requested by translators.

The source of truth would be there for every project individually with the option to have it all in one place on a platform like Weblate (I just had a look and they seem to somewhat support that, at least you can add several different projects/components with independent repositories: https://docs.weblate.org/en/latest/admin/projects.html) for example. Can you also elaborate why translators need such central source with all projects basically in one repository? I think it is pretty important to understand the requirements. Feel free to link to already existing comments or resources about this.

clel added a comment.Aug 21 2020, 5:03 PM

Weblate (as any online translation system) is still deadly slow compared to offline tools. On Hosted Weblate and Fedora's Weblate I witnessed constant git merge conflicts every week which should be resolved manually. This is a total nightmare for the KDE range of repositories number. Actually, I will have to prioritize some KDE translations because it is unrealistic to work with all of them through web interface (only ~15 minutes a day just to upload the translations, not saying about translating).

Weblate, Pootle, Wordbee, Transifex, Rosetta, etc. all tailored to leave several translations in a week or for *huge* teams.

Thanks for the insight. Maybe we should continue that on T11070 or T13311 to not hijack this task too much. I don't know the reasons behind those merge conflicts, so I cannot really judge the tools on that. Are you aware that you can download PO files from Weblate, use them in your offline workflow and upload them again?

In T13514#237900, @clel wrote:

Weblate (as any online translation system) is still deadly slow compared to offline tools. On Hosted Weblate and Fedora's Weblate I witnessed constant git merge conflicts every week which should be resolved manually. This is a total nightmare for the KDE range of repositories number. Actually, I will have to prioritize some KDE translations because it is unrealistic to work with all of them through web interface (only ~15 minutes a day just to upload the translations, not saying about translating).

Weblate, Pootle, Wordbee, Transifex, Rosetta, etc. all tailored to leave several translations in a week or for *huge* teams.

Thanks for the insight. Maybe we should continue that on T11070 or T13311 to not hijack this task too much. I don't know the reasons behind those merge conflicts, so I cannot really judge the tools on that. Are you aware that you can download PO files from Weblate, use them in your offline workflow and upload them again?

Sure. I'm Fedora's Weblate admin.

clel added a comment.Aug 21 2020, 6:01 PM

Alright. Then I don't really understand what problems you have with Weblate. The things you wrote are too general for me to understand what the concrete problems are that you experience.

In T13514#237909, @clel wrote:

Alright. Then I don't really understand what problems you have with Weblate. The things you wrote are too general for me to understand what the concrete problems are that you experience.

Sure. Just one thing that I do not understand is that people who do not really understand how the translation system works eagerly want to change that system.

Because they want to change it into something they might have an actual chance of understanding?

(you might say, they are trying to 'translate' the system into something they understand. :D *hides*)

clel added a comment.Aug 22 2020, 3:01 PM
In T13514#237909, @clel wrote:

Alright. Then I don't really understand what problems you have with Weblate. The things you wrote are too general for me to understand what the concrete problems are that you experience.

Sure. Just one thing that I do not understand is that people who do not really understand how the translation system works eagerly want to change that system.

When you write "Sure", I expect some more insights :) You wrote about problems you had but did not really give much detail about them. You talk about Weblate being much slower than offline tools while not mentioning which parts of the workflow you are talking about (admin stuff, translation itself, downloading and uploading PO files?). What leads to the merge conflicts? Uploading PO files while the same lines have been changed through the online interface? You wrote you would have to work with KDE translations "through a web interface". What tasks are you referring to when you say "work"? Translation itself apparently not, since you say you are aware that you can download and upload PO files. So what is taking so long there? You mention uploading translations takes "~15 minutes a day", can you give some details? Are we talking about some maintenance stuff or actually contributions to translations? Why do you imply that translations themselves are slower with Weblate?

This were some of the questions coming to my mind when I tried to understand which problems you face with Weblate. I wrote them down explicitly now, so you hopefully can understand better what has been unclear to me.

Regarding your question, which I felt was a bit insulting and implying things that are not true: There has been a long discussion already that contribution to translations should be made easier, possibly by a web-based system like Weblate. This task however is not about that. My question here mainly was why one central repository for all translations is needed instead of for example hosting them together in each project's repository. Currently I have not really seen a reason why that would cause a problem, but knowing that I don't know how the entire system is setup, I of course think there might be reasons why that is not a good idea. That is why I asked for such reasons.

I prefer to have those discussions factual and hopefully leading to an increase of available information to make better decisions.

Because they want to change it into something they might have an actual chance of understanding?

Speaking for me, I don't "want to change it", I just want to elaborate possible improvements. I guess the benefits of storing PO files in each project's repository are clear (but I am willing to enumerate them again). What is not so clear (at least to me) are the possible disadvantages or blockers of that concept.

I already wrote it: a central place is needed so that

  • people *NOT* using weblate don't have to checkout tons of repositories to contribute. That's enough in itself.
  • we may use posummit even with weblate to provide a single branch to everyone, which means that some logic to inject the translations to each branch will be needed somewhere else
  • it will be the only interface that weblate would have to deal with (because in 5 years we may change again tool, and we don't lose the history)
  • even in the case were part of the web tool would be the central place, still that would be the reference point, not the content of each repository which would be a mirror once we solve T12268.
In T13514#237939, @clel wrote:
In T13514#237909, @clel wrote:

Alright. Then I don't really understand what problems you have with Weblate. The things you wrote are too general for me to understand what the concrete problems are that you experience.

Sure. Just one thing that I do not understand is that people who do not really understand how the translation system works eagerly want to change that system.

When you write "Sure", I expect some more insights :) You wrote about problems you had but did not really give much detail about them. You talk about Weblate being much slower than offline tools while not mentioning which parts of the workflow you are talking about (admin stuff, translation itself, downloading and uploading PO files?).

All of them. There is no need for offline administration, translating every string through web interface is several times longer even in the zen mode, "downloading" new strings through Subversion then found what to translate in Lokalize takes ~5 seconds, analyzing big projects (Fedora is smaller than KDE now) in Weblate takes minutes. Uploading big files (libguestfs and its man, libvirt, Weblate docs, etc.) literally takes up to 10 minutes for just one file. I can imagine how long it would be to upload KStars, Krita and its docs (the last update required several dozens of files to be uploaded, the translation itself contains several hundred files), RKWard, KMyMoney or LabPlot.

Moreover, Krita developers seems want some kind of CI with ready-to-use packages after every translation. I think it's a dream without some kind of powerful cluster at hand with the current state of art.

What leads to the merge conflicts?

Developers, translators, and the automatic merging system working in parallel. Just the last example from Hosted Weblate. Two days ago Michal decided to upload new translations when I tried to upload my translation. This resulted in automated push into the Weblate repo then renewal and locking translation in the web interface. For several hours all translations were locked. Should I do not have an offline translation, my translations would be lost after Michal resolving the merge conflict manually.

Fedora's translations are broken this way almost every week. The only savior is that they change very rarely.

The similar GNOME project (DL) is also *very* fragile (broken every month or so).

Uploading PO files while the same lines have been changed through the online interface? You wrote you would have to work with KDE translations "through a web interface". What tasks are you referring to when you say "work"? Translation itself apparently not, since you say you are aware that you can download and upload PO files.

Yes, I am. I do not have an additional hour in a day just to upload ~50-60 changed strings in the KDE everyday package. It's not a Fedora with its ~100 new strings in a month or GNOME with its average 5 new strings in a day.

So what is taking so long there? You mention uploading translations takes "~15 minutes a day", can you give some details? Are we talking about some maintenance stuff or actually contributions to translations? Why do you imply that translations themselves are slower with Weblate?

Just try to use Weblate somewhere. Then you will understand. You will lose time on every single step - finding what to translate, opening catalog, finding what you need to translate, translating, opening the next string, dealing with stupid quality checks (absolutely impossible thing on the large catalogs).

This were some of the questions coming to my mind when I tried to understand which problems you face with Weblate. I wrote them down explicitly now, so you hopefully can understand better what has been unclear to me.

Regarding your question, which I felt was a bit insulting and implying things that are not true: There has been a long discussion already that contribution to translations should be made easier, possibly by a web-based system like Weblate.

Sorry, but the main role in such discussions is for the people that (guess what?) just want to translate one application. For this scale, there will be no substantial disadvantages for offline systems. Their shortcomings (low speed, bad scalability) comes with good accessibility and some automation.

This task however is not about that. My question here mainly was why one central repository for all translations is needed instead of for example hosting them together in each project's repository.

Because of the scale of our project. You've got this answer several times above.

Currently I have not really seen a reason why that would cause a problem, but knowing that I don't know how the entire system is setup, I of course think there might be reasons why that is not a good idea. That is why I asked for such reasons.

I'm a scientist. Please give me those numbers that show that Weblate (Transifex, Rosetta, etc.) make translation "easier" for translators (e.g. "The XX translation service gives the project 2 times more translators and 4 times more translations in the long run because it's easy"). Hopefully with your definition of "easier". Thanks in advance.

Online translation tools can make access to translation easier (not even in every case) but they definitely make the translation process deadly longer. Everything comes with its price.

I prefer to have those discussions factual and hopefully leading to an increase of available information to make better decisions.

Because they want to change it into something they might have an actual chance of understanding?

Speaking for me, I don't "want to change it", I just want to elaborate possible improvements. I guess the benefits of storing PO files in each project's repository are clear (but I am willing to enumerate them again). What is not so clear (at least to me) are the possible disadvantages or blockers of that concept.

clel added a comment.Aug 22 2020, 4:41 PM

I already wrote it: a central place is needed so that

Thanks for those points. As I said, if you already wrote that, I'd also been happy with just a link to that.

  • people *NOT* using weblate don't have to checkout tons of repositories to contribute. That's enough in itself.

I guess I wasn't aware of this workflow. So people basically checkout the repository currently and then go around different projects offline to then translate? I understand that this is more efficient when doing alot of translation work across several projects. I have been thinking from my perspective where I'd look for a project first and then download maybe the PO file to complete the translation and upload it again (without ever checking out any repository) and then uploading it through a webinterface or directly using that webinterface to translate a project through the browser.

  • we may use posummit even with weblate to provide a single branch to everyone, which means that some logic to inject the translations to each branch will be needed somewhere else

This might become a valid reason, however I think currently this is not really a blocker.

  • it will be the only interface that weblate would have to deal with (because in 5 years we may change again tool, and we don't lose the history)

Somewhat valid. But note that also having the projects store the translation files there will be an interface. Just spread across a large number of projects. In fact, I guess this interface to the projects is also there today, but in a different way, since it is needed to sync them with the translations repository.

  • even in the case were part of the web tool would be the central place, still that would be the reference point, not the content of each repository which would be a mirror once we solve T12268.

The task you linked seems to be wrong. That makes it hard for me to understand what you are saying here.

Abella added a subscriber: Abella.Aug 22 2020, 5:11 PM
In T13514#237939, @clel wrote:
In T13514#237909, @clel wrote:

Alright. Then I don't really understand what problems you have with Weblate. The things you wrote are too general for me to understand what the concrete problems are that you experience.

Sure. Just one thing that I do not understand is that people who do not really understand how the translation system works eagerly want to change that system.

When you write "Sure", I expect some more insights :) You wrote about problems you had but did not really give much detail about them. You talk about Weblate being much slower than offline tools while not mentioning which parts of the workflow you are talking about (admin stuff, translation itself, downloading and uploading PO files?).

All of them. There is no need for offline administration, translating every string through web interface is several times longer even in the zen mode, "downloading" new strings through Subversion then found what to translate in Lokalize takes ~5 seconds, analyzing big projects (Fedora is smaller than KDE now) in Weblate takes minutes. Uploading big files (libguestfs and its man, libvirt, Weblate docs, etc.) literally takes up to 10 minutes for just one file. I can imagine how long it would be to upload KStars, Krita and its docs (the last update required several dozens of files to be uploaded, the translation itself contains several hundred files), RKWard, KMyMoney or LabPlot.

Hi, I am Catalan translator in multiple projects on Transifex (SubSurface, MKVToolNix, etc.) and if it were not for the fact that they are relatively small, I can download the translation as a PO file, etc. I would have thought to start.

I am only translator and time spent is much more.
Only advantages: the translation is done and makes it easier to find translations for other projects (more visibility).

yaron added a subscriber: yaron.Aug 26 2020, 11:31 AM

"Git should facilitate integration with external translation file trees maintained by Linux distributions (e.g. BaseALT)"
Let me rephrase this for you: "We're doing things wrong, can you please change your upstream behaviour so it's less painful for us to do things wrong?"

Okay, this requirement is now obsolete because we've granted SVN write access to someone from BaseALT's translation team.

Another problem with SVN is, it's slow. It sometimes takes at least 10 minutes to "svn cleanup" + "svn up" when only the translations from branches trunk/kf5 and stable/kf5 are checked out, making the full run of scripty slow on my laptop (note: I don't use SSD, that might be the reason). A similar Git command (e.g. git reset --hard) would be much faster for a similarly sized repo.

Btw I've started slowly rewriting scripty in Go because

  • I like Go more than bash.
  • It's easier to make it more configurable than in the current state with a set of scripts calling each other. We need configurability to share codebase across branches and to let users run parts of scripty end-to-end (will be useful for www that needs more frequent syncs).
  • it's easier to parallelize things in Go, so we could shorten the running time.

I didn't share my code yet. Don't expect fast progress, maybe I'll be able to release something by May 2021.

Please note I'm working on base replacement in python too.
The idea is to have something modular (starting from an independent translation extraction application which may be used elsewhere). I plan to share something before that deadline.

A binary is not going to help much, as most of the time is going to spend on I/O on the disk, in my experience.

huftis added a comment.EditedJan 14 2021, 4:39 PM

Another problem with SVN is, it's slow. It sometimes takes at least 10 minutes to "svn cleanup" + "svn up" when only the translations from branches trunk/kf5 and stable/kf5 are checked out, making the full run of scripty slow on my laptop (note: I don't use SSD, that might be the reason).

This sounds excessive – and rather strange. One thing I like about SVN, is that it is fast. I just did a test of svn up fetching about a whole week worth of updates for 5 languages (or 4 languages + templates) in l10n-kf5, and this took about 4 seconds (on my ~9 year old desktop PC).

svn cleanup may be a bit slow, but it should only needed when something is actually wrong with your local checkout. There should be no reason to run it except when svn status returns a non-empty status and you think it shouldn’t. And even then, trying svn revert --recursive . first is a must better option. (You don’t run git fsck followed by ‘git repack’ before each git pull!)

On the other hand, my experience with Git on similarly large repos is that Git is very slow, even when there are few updates. For an example, try cloning the Git repo of the LibreOffice translations (which is much smaller than the KDE translation repo): git clone https://github.com/LibreOffice/translations.git

ognarb added a subscriber: ognarb.Jan 15 2021, 11:01 PM

Please note I'm working on base replacement in python too.
The idea is to have something modular (starting from an independent translation extraction application which may be used elsewhere). I plan to share something before that deadline.

A binary is not going to help much, as most of the time is going to spend on I/O on the disk, in my experience.

Do you have a branch with your changes? I started something similar 2 weeks ago, so I could take a look at that I could merge into your work.

yaron added a comment.Jan 17 2021, 9:44 AM

@huftis @aspotashev There are too many involved parameters, you can't measure performance like that.

ngraham added a subscriber: ngraham.Mar 7 2021, 3:20 PM

Just in case, this is the thing I'm working on, but it's still heavily WIP, especially the injector interface (something is already working with the example legacy extractor):
https://invent.kde.org/ltoscano/noktra/

Before thinking how to add code, please coordinate with me. I guess I will need another ticket because this is a subtask of the whole process.

cblack added a subscriber: cblack.Mar 30 2021, 10:45 PM

Btw I've started slowly rewriting scripty in Go because

  • I like Go more than bash.
  • It's easier to make it more configurable than in the current state with a set of scripts calling each other. We need configurability to share codebase across branches and to let users run parts of scripty end-to-end (will be useful for www that needs more frequent syncs).
  • it's easier to parallelize things in Go, so we could shorten the running time.

    I didn't share my code yet. Don't expect fast progress, maybe I'll be able to release something by May 2021.

FWIW I have something similar to this concept that I initially wrote for KWinFT: https://gitlab.com/kwinft/tooling/-/tree/master/i18n/extractor. It essentially replaces the Messages.sh inherited from KDE with a messages.yaml that conveys much of the same information in a more declarative format & changes some things to work better with translations being stored in-repo & managed as part of the usual git workflow (although the translations can be output anywhere).

miepee added a subscriber: miepee.Jul 25 2022, 8:14 PM

Last activity here was a year ago, has there been any further thoughts or updates on this?

tmpod added a subscriber: tmpod.Jul 26 2022, 6:50 PM

Would someone (@ltoscano ?) be able to give a short status update on this at Akademy? Where discussion/work/testing is required?

mormegil added a subscriber: mormegil.
emohr added a subscriber: emohr.Nov 14 2023, 4:43 PM