Enclosing placeholders by isolation characters
Open, Needs TriagePublic

Description

With Qt5.11, we got full Unicode support including isolation[1] characters that are useful in bidirectional text.

Knowing the fact how bidi text is hard to get right (Check 1/1.5 and 2), I would like to discuss if we should enclose every placeholder used in KI18n with the two isolation characters FSI (u2068, First Strong Isolate) and PDI (u2069, Pop Directional Isolate) by default. This makes it much more easier for programs to support proper bidi text without a lot of work.
The downside of this is that apps running older version of Qt may get the text misrendered, and copying text to apps that doesn't support Unicode properly will cause issues.

Ideas?

[1] Isolation is a way to seperate a specific chunk of text, detecting it's direction, getting the order correct and readding it to the actual text.
Example:
Arabic string has a dot at start and an underscore at end: _CIBARA.
Without the isolation it will show as: This is an .CIBARA_ text.
With the isolation: This is an _CIBARA. text.

aacid added a subscriber: aacid.Jun 13 2018, 9:10 PM

I would like to discuss if we should enclose every placeholder used in KI18n with the two isolation characters

You mean changing every single i18n() call that uses a placeholder? or something else?

In T8984#147444, @aacid wrote:

You mean changing every single i18n() call that uses a placeholder? or something else?

Yes, I mean that. The same way that we'll (if ever) do it for proper formatting of numerical placeholders.

I already disagreed that method is the correct solution, so no it's not "the same way"

In T8984#147453, @aacid wrote:

I already disagreed that method is the correct solution, so no it's not "the same way"

Well, I said the same way we'll do it for the formatting, so if we'll go with “all placeholders are locale-formatted”, then we'll go with “all placeholders are enclosed with isolation characters” :)
I'm with you that it's better to not let translators struggle with it, and make it the “by default” thing.

huftis added a subscriber: huftis.Jul 13 2018, 5:55 PM

Well, I said the same way we'll do it for the formatting, so if we'll go with “all placeholders are locale-formatted”, then we'll go with “all placeholders are enclosed with isolation characters” :)
I'm with you that it's better to not let translators struggle with it, and make it the “by default” thing.

FWIW, I’m not sure I understand all the implications, but I think this sounds like a good idea. But I have two questions:

  • Will these characters also be needed for LTR languages?
  • And would it be possible to make the feature backwards compatible with older versions of Qt, e.g. only include the isolation characters for Qt ≥ 5.11?

First, sorry for any language mistakes.

FWIW, I’m not sure I understand all the implications, but I think this sounds like a good idea. But I have two questions:

  • Will these characters also be needed for LTR languages?

In some cases, yes they will be needed. Take this PR as an example, the same issue is with Dolphin:


Another case is the description of the format in the Regional Settings category, Formats section in the System Settings (Look how "(Long Format)" is on the left of the actual format (and vice versa for RTL layout and a LTR format):

  • And would it be possible to make the feature backwards compatible with older versions of Qt, e.g. only include the isolation characters for Qt ≥ 5.11?

I think with the if QT_VERSION thing, it is possible. Older versions does strange stuff if we let them render the text with these characters.

First, sorry for any language mistakes.

FWIW, I’m not sure I understand all the implications, but I think this sounds like a good idea. But I have two questions:

  • Will these characters also be needed for LTR languages?

In some cases, yes they will be needed. Take this PR as an example, the same issue is with Dolphin:

Do you mean that those markers are needed anytime there is a mixed string with a translated part (which could be LTR or RTL) and a fixed part which has always the same direction?

  • And would it be possible to make the feature backwards compatible with older versions of Qt, e.g. only include the isolation characters for Qt ≥ 5.11?

I think with the if QT_VERSION thing, it is possible. Older versions does strange stuff if we let them render the text with these characters.

Can you please show how it would work? Would those placeholders always be added to the appropriate string? Wouldn't it be possible to always inject them through i18n functions?

Do you mean that those markers are needed anytime there is a mixed string with a translated part (which could be LTR or RTL) and a fixed part which has always the same direction?

Yes I mean that. In general, whenever the final string can have both RTL and LTR texts, enclosing one of them (non-strong) is a must.

Can you please show how it would work? Would those placeholders always be added to the appropriate string? Wouldn't it be possible to always inject them through i18n functions?

This is just a fast idea: we have "%s".arg(part), we can just enclose that %s with the markers to be "FSI%sPDI".arg(part), where FSI and PDI are real Unicode characters.
I had to share my initial thoughts about this, but I have no big experience with coding, neither with KI18N code database. I can't work on this on the near future, but if no one took this task, I'll try my best implementing it myself later.