Improvements for Gettext entries wordwrapping
Needs RevisionPublic

Authored by azamathackimov on Jan 21 2018, 5:58 PM.

Details

Reviewers
schwarzer
Summary

Improves word wrapping code for Gettext catalog.
https://bugs.kde.org/show_bug.cgi?id=380007

Old algorithm works not right as expected - if text have unintended newlines, lokalize wraps them as required, and result goes not identical as vanilla gettext.
This produce some noise when committing PO catalogs into git.

  • Fix issue when whitespace appears as leading character in next string instead of trailing in previous.
  • New algorithm removes all newlines and then add them where they are really needed, after that goes standard word wrapping with nice max 78-wide lines.

Diff Detail

Repository
R456 Lokalize
Lint
Lint Skipped
Unit
Unit Tests Skipped
There are a very large number of changes, so older changes are hidden. Show Older Changes
aacid added a comment.Aug 5 2018, 11:22 AM

Simon what do you think about this?

sdepiets updated this revision to Diff 39238.Aug 7 2018, 5:12 AM

Fixing patch

Simon what do you think about this?

I've just "rebased" the patch because I fixed an issue with the wordwrap already in the same file.
I need more context to understand what exactly is the issue with the current behavior to begin with. Do you have specific examples ?

With this patch this is the behavior that I see (it removes all line breaks in source and target), which seems rather unwanted in that situation.

-msgid ""
-"<center>This investment account does not contain the \"%1\" security.</center>"
-"<center>Transactions involving this security will be ignored.</center>"
-msgstr ""
-"<center>Ce compte de placement ne contient pas le titre « %1 ».</center>"
-"<center>Les opérations impliquant ce titre seront ignorées.</center>"
+msgid "<center>This investment account does not contain the \"%1\" security.</center><center>Transactions involving this security will be ignored.</center>"
+msgstr "<center>Ce compte de placement ne contient pas le titre « %1 ».</center> <center>Les opérations impliquant ce titre seront ignorées.</center>"
sdepiets requested changes to this revision.Aug 7 2018, 5:20 AM
This revision now requires changes to proceed.Aug 7 2018, 5:20 AM

It has been more than a year. Should we close this and https://bugs.kde.org/show_bug.cgi?id=380007?

We never got a proper explanation of what this does or how it works.

The person that submitted it never answered any question.

I'd say abandon this.

May be the original author is not following Phabricator. You can try and ping him in the bugs.kde.org report or by email.

Hello, it's me again.

Previous work was not ideal, so I rewrote whole thing.

aacid added inline comments.Dec 11 2019, 9:52 PM
src/catalog/gettext/gettextexport.h
77

wouldn't it make more sense if the signature was
static QStringList tokenize(const QString &text, const QRegExp &delimiters);
?

azamathackimov marked an inline comment as done.Dec 14 2019, 6:12 PM
azamathackimov added inline comments.
src/catalog/gettext/gettextexport.h
77

Well, I thought that will be non-optimal. Refactored to

static QStringList tokenize(const QString &text, const QRegExp &delimiters);
azamathackimov marked 2 inline comments as done.Dec 17 2019, 8:52 PM

Can you share a .po file where this makes a difference?

And also can you provide a better description of what it does as a commit message? Because "Improvements for Gettext entries wordwrapping" is as generic as it gets.

azamathackimov edited the summary of this revision. (Show Details)

updated diff

Can you share a .po file where this makes a difference?

And also can you provide a better description of what it does as a commit message? Because "Improvements for Gettext entries wordwrapping" is as generic as it gets.

Here example from one of my translation projects. I have properly formatted by msguniq dable.dlg.po - o dable.dlg.po file
First commit with unpatched lokalize:
https://github.com/winterheart/planescape/commit/72ae2d06b479e3e5ed8ea5e36ab5458806d094e2

Notice that I didn't made any meaningful changes to message, I only pressed Ctrl-i on active translation (remove newlines from translation) and saved file. Expected result is no changes on entry, but lokalize wrapping message incorrectly. Notice that how word wrapping adds leading spaces on each newline in message.

Second commit with patched lokalize right after unpatched commit:
https://github.com/winterheart/planescape/commit/47003b033239a260f4a286f6406594f2d5ae8477

Now lokalize wraps message correctly.

Why this is big deal? When translation being updated, usually there msgmerge command are involved that by default word wrap messages with gettext's wrap() function (https://github.com/autotools-mirror/gettext/blob/master/gettext-tools/src/write-po.c#L616). If PO-file is not word-wrapped correctly, updated file will have lots noisy and pointless changes related to wrapping, and worse - this goes to git history. This make difficult to developer and translator to see real difference. Other scenario - when two translators uses different tools to translate - say lokalize and poedit. poedit uses wrap directly from msguniq (calls it on save), and again - lots of noise and pointless changes.

gettext's wrap() function is bit complicated and has lots of legacy code and portable inlines, but key of algorithm is using libunistring's tokenizing by finding possible line breakers. Since in Qt 5 this can be done a lots easier, so I implemented tokenizer based on QTextBoundaryFinder with QTextBoundaryFinder::Line boundary detector.

When you say "incorrectly" you mean "not like i like it" or "not like gettext creates", right?

The file is just as valid, isn't it?

It may be valid, but the "standard version" produced for example by msgfmt is the one where the spaces are at the end of the line, not at the beginning.
It is worth noting that KBabel first and Lokalize later, up to a certain version which I don't remember, did produce the "properly" formatted version.

IIRC scripty reformats the messages too (as it uses msgfmt), so IMHO using the same format would be useful.

aacid added a comment.Feb 18 2020, 8:46 PM

Sure, i agree producing similar formatting to those of other tools is good.

I just want to clarify if we're actually speaking of "incorrect" or just of "different"

I moved this patch to GitLab: https://invent.kde.org/sdk/lokalize/-/merge_requests/60

I will try to check it according to the comments made here.
Feel free to also comment on GitLab.

This comment was removed by bcooksley.
kpj accepted this revision.Jun 25 2023, 5:57 AM
kpj removed a project: Localization.
kpj removed subscribers: bcooksley, schwarzer, aiswaryak and 6 others.
This revision is now accepted and ready to land.Jun 25 2023, 5:57 AM
ltoscano closed this revision.Jun 25 2023, 10:23 AM

Moved to gitlab. Unfortunately a spammy user disrupted the status.

This comment was removed by bcooksley.
KP_ALPHA reopened this revision.Jun 25 2023, 7:25 PM
KP_ALPHA added a subscriber: aklapper.
This revision is now accepted and ready to land.Jun 25 2023, 7:25 PM
KP_ALPHA commandeered this revision.Jun 25 2023, 7:26 PM
KP_ALPHA added a reviewer: azamathackimov.
KP_ALPHA added a project: KASync.
This comment was removed by bcooksley.
azamathackimov set the repository for this revision to R456 Lokalize.
azamathackimov removed a project: KASync.
This revision now requires review to proceed.Jun 25 2023, 9:48 PM
azamathackimov commandeered this revision.Jun 25 2023, 9:55 PM
azamathackimov edited reviewers, added: KP_ALPHA; removed: azamathackimov.
azamathackimov removed a reviewer: KP_ALPHA.
azamathackimov changed the edit policy from "All Users" to "Administrators".
KP_DELTA accepted this revision.Jun 25 2023, 9:59 PM
This revision is now accepted and ready to land.Jun 25 2023, 9:59 PM
azamathackimov edited reviewers, added: schwarzer; removed: KP_DELTA.Jun 25 2023, 11:24 PM
This revision now requires review to proceed.Jun 25 2023, 11:24 PM
This comment was removed by bcooksley.
KP_EPSILON requested changes to this revision.Jun 26 2023, 1:58 AM
This comment was removed by bcooksley.
This revision now requires changes to proceed.Jun 26 2023, 1:58 AM
KP_EPSILON accepted this revision.Jun 26 2023, 1:58 AM
This comment was removed by bcooksley.
This revision is now accepted and ready to land.Jun 26 2023, 1:58 AM
KP_EPSILON requested changes to this revision.Jun 26 2023, 1:58 AM
This comment was removed by bcooksley.
This revision now requires changes to proceed.Jun 26 2023, 1:58 AM
KP_EPSILON resigned from this revision.Jun 26 2023, 1:59 AM
This comment was removed by bcooksley.
KP_EPSILON accepted this revision.Jun 26 2023, 1:59 AM
This comment was removed by bcooksley.
This revision is now accepted and ready to land.Jun 26 2023, 1:59 AM
This comment was removed by bcooksley.
KP_ZETA accepted this revision.Jun 27 2023, 12:26 AM
This comment was removed by bcooksley.
KP_ZETA requested changes to this revision.Jun 27 2023, 12:26 AM
This revision now requires changes to proceed.Jun 27 2023, 12:26 AM
KP_ZETA resigned from this revision.Jun 27 2023, 12:26 AM
This revision is now accepted and ready to land.Jun 27 2023, 12:26 AM
This comment was removed by bcooksley.
KP_ZETA requested changes to this revision.Jun 27 2023, 12:26 AM
This comment was removed by bcooksley.
This revision now requires changes to proceed.Jun 27 2023, 12:26 AM