When writing in Kate in a RTL language, Kate acts weird
Open, Needs TriagePublic


I feel like this will be overlooked if I won't explain few things first.

Information about RTL languages, The case for RTL support, it's importance and why it shouldn't be overlooked

Here is a quick note on RTL writing systems from Wikipeida:

In a right-to-left, top-to-bottom script (commonly shortened to right to left or abbreviated RTL), writing starts from the right of the page and continues to the left. This can be contrasted against left-to-right writing systems, where writing starts from the left of the page and continues to the right.

Arabic, Hebrew, Persian, and Urdu Sindhi are the most widespread RTL writing systems in modern times.

Right-to-left can also refer to Text direction top-to-bottom, right-to-left (TB-RL or TBRL) scripts such as Chinese, Japanese, and Korean, though they are also commonly written Text direction left to right. Books designed for predominately TBRL vertical text open in the same direction as those for RTL horizontal text: the spine is on the right and pages are numbered from right-to-left. (1)

The Arabic script is the writing system used for writing Arabic and several other languages of Asia and Africa, such as Persian, Kurdish, Azerbaijani, Sindhi, Pashto, Lurish, Urdu, Mandinka, and others... It is the second-most widely used writing system in the world by the number of countries using it and the third by the number of users, after Latin and Chinese characters. (2)
Here is a map to illustrate this info:

  • All varieties of Arabic combined are spoken by perhaps as many as 422 million speakers (native and non-native) in the Arab world. (3)
  • There are approximately 110 million Persian speakers worldwide, with the language holding official status in Iran, Afghanistan, and Tajikistan. (4)
  • According to Nationalencyklopedin's 2010 estimates, Urdu is the 21st most spoken first language in the world, with approximately 66 million speakers. (5)

Hebrew is a Northwest Semitic language native to Israel; the modern version of which is spoken by over 9 million people worldwide.
As a foreign language, it is studied mostly by Jews and students of Judaism and Israel, and by archaeologists and linguists specializing in the Middle East and its civilizations, as well as by theologians in Christian seminaries. (6)

My take on the issue (AKA The case for RTL support, it's importance and why it shouldn't be overlooked)

It seems that hundreds of millions of people (or even more than a billion - if taking into account TBRL in languages such as Chinese, Japanese and Korean) - live, speak, use and interact with RTL language systems.

Language is one of the basic means of communication and interaction. When users can't do simple things with our software, because we don't respect their language, we reduce our reach to other communities. I believe the goal we set for ourselves (KDE Usability & Productivity, AKA "Top-notch usability and productivity for basic software") means exactly that - We should attend to problems that undermine _Usability & Productivity_. Not only for the current users of KDE software, but for the billions of potential users out there. Not allowing hundreds of millions or billions of potential users, to interact with our software, is a serious problem. Fixing it should be a top priority.

If someone is missing a shortcut button in Kate, that might be annoying to some users of Kate, but if Kate (or Okular) doesn't support RTL all the way, then this is going to be annoying to billions of potential users.

Be honest: Would continue on using Kate, if the bugs that I mention now, were happening to you? If you couldn't delete/add a letter in Kate because every time you tried doing it, it would delete/add it somewhere else on the text. Imagine this: Would you continue on using Okular, if you couldn't use the search function, because Okular (contrary to Atril and other gtk3 pdf readers) treats RTL text in a messy way - Which means that for you a PDF file is like an image file - You can't interact with it (can't search & can't copy and paste). Makes using a PDF file obsolete.

Maybe you're a power user that can find lots of workarounds - use Atril instead of Okular, use Pluma instead of Kate, and patch a working suitable system that can handle your native RTL language. But if you're sincere, would you recommend your beginner-compatriot-friends, people that you care for and are counting on you, when it comes to computers, to use a KDE system like Kubuntu?

I'm not a professional critic like Igor Ljubuncic / Dedoimedo, but my critique is sincere and I believe not-fixing-issues-like-this undermines the adoption of KDE software that we hope to achieve.

I believe that if open source software and especially KDE software is to grow in markets outside LTR language-systems, it cannot happen without extending the support for RTL language systems as well. Microsoft did it during the 1990's... I mean, fixing RTL issues for things like notepad, office suite, etc. There are no major RTL issues in their software in the last 20 years. Basic things must work first. As @ngraham said in his proposal that was adopted:

We will need to focus on adding productivity-related features, fixing bugs, and addressing quality-of-life issues in our software's GUI interfaces, particularly the most basic and commonly-used KDE software and the frameworks that power them: ..., Kate, Okular,.. (7)

We could say "Let those other people use other basic application that we didn't write", we could say "We can't solve this problem until x,y,z is solved first - this is not in our hands", we could say "We don't care for people from communities of non-European languages" or "We don't care for scholars/students that interact with the history/culture of those communities" - I'm not saying that we should, but it's perfectly OK to decide so, but this should be debated, thought-thoroughly, understood and agreed - if we don't see those hundreds of millions as potential KDE users.

Finally, Kate's bug/s

The first bug is the most major one I found in Kate (I believe that the other ones are related to it, and I'm mentioning them so it might help figure out how to solve this)

  1. If a line that is written in a RTL language is long enough to slide to a second line, Kate marker will not be able to understand the location of the letters. It will show itself between two letters but in fact it is a _mirage_ since there's no correlation between the position we see and the position it actually has.

    This is an example given with Latin characters just to help someone who's not familiar with RTL text, understand what I'm about to show. Imagine you have a long sentence which is like this:

    am am am am am am am am am am am am am am am am am am am am am

    (and it's long enough that the line continues and slides to a second line. If you're placing the marker at the middle of the word, and add a character, for example "r", you expect to see the word "arm", what happens in kate (with a RTL language) is that the letter will add one character after the one where the marker stands. So you would get the word "amr" (which means nothing...) The same thing happens when you're trying to delete. For example, placing the marker at the end of a word, "am", and deleting the last character (with backspace), you would expect that the "m" would get deleted, but what happens is that Kate will delete the next character, in this example, the space (" "). If the line is short (not sliding to a second line), then all will work as it should.

    Now I'll show an example with Hebrew (A RTL language) characters. In this example I will write the word "אם" few times, until the sentence slides to the next line, and then I'll try to add a character in the middle of two letters, and it will add after the next letter instead of where the marker is. Then I'll try to delete the last character "ם". Instead of it being deleted, the space that's after it will get deleted, and will join the two words. Afterwards, I will shorten the line, so it's not sliding, and then make the same deletion, and you'll see that everything works as expected.

    You can copy this line and try it for yourself:

    אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם אם

    This bug is extremely annoying if you're a user of Kate. Imagine that every correction that you wish to make in your text, every word you want to add, just create more mess. At the end you start highlighting whole words just to fix one character or delete a whole word just to add the same word and other words.
  1. In this kind of problematic line, the marker will sometimes "leave" a trace - a dot on top of where it was. Maybe this can help someone figure out why this is happening...
  1. When you hit the "Home" button and get to the beginning of the line, and then hit the left-arrow-key, it would do "nothing". The expected behavior would be to get behind the first character. Only when hitting the left-arrow-key for the second time, the marker will move and be placed after the first character. It seems like there's an invisible character in "weird" long lines like this.
  1. Another problem with Kate is the upward movement when using the up-arrow-key. The upward movement, using the up-arrow-key, will stop functioning sometimes when getting to a line of that sort (a long line that slides). The marker will move up, until it gets to this kind of problematic line. Then it will move to the beginning of the line (even if the initial movement didn't begin in the beginning of a line). The marker will then stay there. Only when pressing the right-arrow-key (moving the marker to the last letter in the line above it), will free the marker. The down-arrow-key will always work.

Thank you for your patience, I hope this can help someone in fixing this. It took me around 4 hours to compile this bug report with all the videos and links. So I also hope this is appreciated.

chfanzil created this task.Apr 29 2019, 9:14 PM

You don't need to belabor the point; we all know that RTL languages are important. :) The key problem is not a lack of caring but rather a lack of RTL language speakers in the KDE community who can test (or even better, offer patches). So thanks very much for bringing this to our attention!

In general, bugs should be reported using https://bugs.kde.org. Phabricator is intended for developer-to-developer communication rather than bug reporting. Also, we need each bug to be in a separate bug report so they can be individually tracked.

Finally, make sure the bugs haven't already been fixed! You didn't mention which versions you're using, so if you're not using Kate 19.04 with KDE Frameworks 5.57, please test again using those versions. Even better if you can compile Kate and the KDE Frameworks from git master for testing purposes (for that, see https://community.kde.org/Get_Involved/development#One-time_setup:_your_development_environment and then https://community.kde.org/Get_Involved/development#Build_some_software).

If all the issues are still valid with the latest versions and/or git master, please close this Phab task and open a bug on https://bugs.kde.org for every issue you've found! Also make sure to read https://community.kde.org/Get_Involved/Bug_Reporting first so you can format your bug reports in a way that makes it most likely that the developers will be able to quickly understand the issue.


Kate 19.04 - Check

KDE Frameworks 5.57 - Check

I don't see a reason to report this in bugs.kde.org again, since it was already reported in 2017. Same thing for Okular issues which have been reported in (2018), (2009-2018), (2008-2011). And each one of them several times (282849, 331785, 386468, D10298, D10455, 128609 and 282850, 345512, 396757 and 156380) during the last 10 years. This would just create more duplicates that you would eventually need to clean-up :)

I hoped that posting it here will encourage a meaningful discussion and get some attention from other team members, in order to try and solve such issues. This is why I belabored about the importance of RTL support and tried to explain that this issue is at the core of "KDE Usability & Productivity" :|

I feel like I was unsuccessful in this attempt. I still love some of KDE software, but I can't help but noticing that in the current situation (last 10+ years and foreseeable future), it cannot appeal to hundreds of millions of people that share something common with me (interacting in a non-European language/writing system).

If the bug reports are already known, then it's the opposite: you don't need to explain them again here. :)

The software is buggy not because nobody cares, but because there appear to be very few people who actually speak write, and use RTL languages. The functionality is just not really getting tested. It's a chicken/egg problem, I know, so that's why it's so great that you're passionate about fixing this situation!

What I've discovered in KDE is that stuff gets done by people who do it. When the community selected my Usability & Productivity initiative, I noticed that not a lot was getting done on it. It became apparent that in order to make it happen, I had to take charge of organizing it and kickstart the work by doing what I could and attracting attention. I think it's the same here. You're a subject matter expert who's frustrated by the sad situation of our RTL support in Kate and Okular. The very best candidate for taking the lead to fix this is... you! :)

Maybe we can keep this open for you to formulate your plan for getting these fixed. I bet among them there are a few that are really easy to fix, and you could do them yourself. Do you have your development environment set up? https://community.kde.org/Get_Involved/development#One-time_setup:_your_development_environment

Issue 1: https://bugs.kde.org/show_bug.cgi?id=385694 (In Qt, it happens only with bidi: https://bugreports.qt.io/browse/QTBUG-71489)
Issue 2: Yes, this happens. I don't know if a bug is reported or not. I remember seeing this with normal LTR text as well, and it got fixed. (I think some height algorithm was changed)
Issue 3: Seems related to issue 1.
Issue 4: This is new. So hitting up in the wrapped RTL text segments takes you to the beginning of the segment, not up to other segments.

And as Nate said, there aren't really much developers who know how RTL and bidi text work. I've tried myself several times, but I lack experiance and isn't familiar with the code base, and it's not an easy task :)
If you can work on this, it would be aswome, or maybe get people interested in this to join the community!

@safaalfulaij, Thank you very much for confirming this and even pointing to the origin of this problem!
It seems that since October-November 2018 when you reported this bug to bugreports.qt.io, nothing has changed.
If I understand correctly, Kate's bug won't be solved, until Qt devs will solve this or there's still a way to patch Kate to overcome this problem?
Is there a way to "encourage" Qt devs to fix the bug on their side?

Also, since it seems you've encountered several RTL qt bugs - Do Qt bugs explain these RTL bugs in Okular:

  1. Not being able to search for text in RTL language (you need to write the text backwards)
  2. Not being able to select RTL text correctly for more than one line. It treats the text as if it is Latin text and selects it from Right-to-Left.
  3. Copying text to Kate, for example, will paste it in reverse.

Even in Qt there aren't many (if any) developers who understands RTL text. That is why many bugs I reported to Qt didn't get resolved till now.

Well, the major bug is in Qt. Even if it got fixed, maybe there is another one in Kate different than this, and maybe not.
As far as I know, there is no way to encourage them other than trying yourself or encoraging others you know to try.

About Okular, it's all about the PDF backend, Poppler. Several bugs about RTL was fixed there, but I'm not sure if they were reflected in Okular as well. (Mainly: https://bugs.freedesktop.org/show_bug.cgi?id=55977). The 3 issues you mentioned are all about searching and selection, which are covered in that bug I think. Mainly, the text should be reversed if it's RTL when copied, selected, etc.