Differential D10298

Fix searching in RTL PDFs
AbandonedPublic
Actions

Authored by ngraham on Feb 4 2018, 3:25 PM.

Details

Reviewers

ltoscano

Group Reviewers

Okular

Summary

BUG: 207748

Since arabic search does not work properly in all but pdf backends, this is a quick attempt to fix this problem. I assumed that text in okular document is in the logical order[ it is a bug by itself]. So by mirroring the search text, the search function works again.

The limitation:

you can not search arabic and english text together.

Future work:
we need to check that text generated by poppler is placed in Visual order, so when we copy it and paste it in text editor is still readable.

Test Plan

Migrated this patch from https://git.reviewboard.kde.org/r/125442/ since it had whitespace errors and the submitter disappeared.

Okular compiles and all tests pass (except for parttest, which was already failing in master)
Don't have any RTL PDFs or the ability to read or write in any RTL languages, so unable to test the functionality. But on the reviewboard page, folks said it worked, and the diff is the same.

Diff Detail

Repository

R223 Okular

Branch

master

Lint

No Linters Available

Unit

No Unit Test Coverage

ngraham created this revision.Feb 4 2018, 3:25 PM

Restricted Application added a project: Okular. · View Herald TranscriptFeb 4 2018, 3:25 PM

ngraham requested review of this revision.Feb 4 2018, 3:25 PM

Please change "migrated from..." with the proper content from the old reviewboard patch, and resubmit it using the original author.
The note about "this was in reviewboard" should not be in the final commit message, but the original content should be.

Update author

ngraham edited the summary of this revision. (Show Details)Feb 4 2018, 3:44 PM

ngraham edited the test plan for this revision. (Show Details)

I tested okular with the patch. I used 2 PDF files in Hebrew. I attached them so others can test. One was downloaded using Wikipedia's Download-as-PDF option. The other was downloaded from random search results, when looking for Hebrew PDFs.

Open Source (Hebrew Wikipedia).pdf232 KBDownload

meida-15.pdf46 KBDownload

The results are as follows

Okular was able to find the text I was searching for (Success).

But it is looking for the text inside each line from left to right and not from right to left (which is the reading/writing direction). When there is more than one occurrence of the text in the same line, it will find the last one first, and the first one at the end. I'm attaching a gif to illustrate this.

I think the problem is caused because Okular treats the whole text as if it is typed backwards. For example, copying text from Okular results in the text being pasted backwards. But when trying to copy the same text from Firefox (when used as a PDF reader) it copies the text correctly. I'm attaching another gif to illustrate this.

So in regards to usability - the current patch is better than nothing. It enables searching for text that is written in a RTL language and should be adopted.

In general, Okular might need some improvements in regards to RTL languages (Hebrew, Arabic, Persian, Yiddish). According to wikipedia (https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers), there are more than 550 million speakers of those languages.

Thanks for the test! However, the original author of this patch recent re-appeared here on Phabricator and submitted a better one: D10455: Add RTL support for search, copy & paste in pdf.

I'm closing this patch in favor of his. Would you mind testing that? Thanks again!

ngraham mentioned this in D10455: Add RTL support for search, copy & paste in pdf.Feb 23 2018, 5:03 PM

The problem is not only with search even if you copy a text the copying generates mirrored texts, it seems to me that Okular deals all texts and words as LTR texts.

Restricted Application added a subscriber: okular-devel. · View Herald TranscriptJul 29 2018, 10:23 AM

@ngraham

In D10298#306887, @userkde wrote:

@ngraham

What's up?

userkde added a comment.Aug 13 2018, 11:39 PM

This comment was removed by userkde.

@ngraham Thank you for your job on this bug .

I want to know what is the new about this bug, the problem is not only with search even if you copy a RTL text the copying generates mirrored texts. maybe the problem is deeper than it seems. I think Okular deals all texts and words as LTR texts even RTL texts.

Thanks! Just so you know, the work in this patch moved to D10455: Add RTL support for search, copy & paste in pdf.

Revision Contents
Changeset List

			Path	Packages
M			ui/searchlineedit.cpp (11 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	26509	2d8b2c7		Feb 4 2018, 3:25 PM	★	★
Diff 2	26510	2d8b2c7	Update author	Feb 4 2018, 3:40 PM	★	★

Commit	Tree	Parents	Author	Summary	Date
7757d797a500	425cef4ab357	2d8b2c7e9592	Fahad Al-Saidi	Fix searching in RTL PDFs (Show More…)	Feb 4 2018, 3:23 PM

Diff 26510

View Options

Fix searching in RTL PDFsAbandonedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 26510

ui/searchlineedit.cpp

Fix searching in RTL PDFs
AbandonedPublic
Actions

Revision Contents
Changeset List