As reported by users, search would not find if the text was split across multiple lines, this fixes it, I added tests to prove it, there's the PDF in the bug report for test.
BUG: 376692
As reported by users, search would not find if the text was split across multiple lines, this fixes it, I added tests to prove it, there's the PDF in the bug report for test.
BUG: 376692
Open PDF provided by user.
Search for United States.
Find three different ocurrences.
No Linters Available |
No Unit Test Coverage |
Buildable 11116 | |
Build 11134: arc lint + arc unit |
Thanks so much for this patch, and for including a test too! It works great for me. I'll let the Okular folks handle the rest of this show. :)
I'm not very convinced with the tests.
You have \n but the rects are all one after the other, so the algorithm that does the line partition doesn't really see a line there, you should have them in different row rects so that they are actually different physical lines.
The "Across Lines not find" doesn't really test what you want, you "want" to test that it finds the last entry, but since the only thing you're checking is that it finds stuff and not in which rects it finds them, it may very well be finding "ab\na"
Same thing for Hyphen doesn't count
Am i making sense?
core/textpage.cpp | ||
---|---|---|
867 | const | |
986 | const |
I don't know much about the searching code, that's why i want nice tests that prove the new code works :)
It shouldn't find anything, and it doesn't.
If not, I'll take a deeper look when I get home, this was just a top of my head solution.
There's a downside to this is that now if you actually put a newline character in the search it will fail where previously it worked.
On the other hand typing an actual newline character is kind of hard (i had to copy it from a newline in kate) so maybe we can just accept that noone really knew how to do that :D
Opinions?
core/textpage.cpp | ||
---|---|---|
865 ↗ | (On Diff #54906) | strNotUsed seems like a weird variable name given you actually use it the two following names. Would something like origStr or something make more sense? |
866 ↗ | (On Diff #54906) | given that you're modifying this line, make len const :) Also are you totally sure this needs to be over strNotUsed and not str? Hmmm i guess it works both ways since str is the same length as strNotUsed |
In my opinion is better to have it this way, I guess most users don't know about the newline.
Do you have an opinion in this @ngraham ?
core/textpage.cpp | ||
---|---|---|
865 ↗ | (On Diff #54906) | Yes, I will make it origStr, it is better. Unfortunately curEntity->text() needs a const variable. I would put it in str, but it would mess up the stringLenghtAdaptedWithHyphen. |
866 ↗ | (On Diff #54906) | I need it to be strNotUsed because I'm making str const, and I'm replacing the '\n' as soon as I create it. It would mess up the stringLenghtAdaptedWithHyphen because it would have the "-\n" text entities as " " |
The use case to insert a newline could be:
Someone is testing a LaTeX document, and searches for a passage from the source code (maybe some verbatim code).
One would just copy the passage in Kate and paste it into the search box in Okular.
Hmmmmm, I can treat things differently if there's a new line in the search box.
Or maybe I should find a way to treat spaces the same as newline chars.
I'll have to take a look in the compare function to see how it would handle, maybe compare alphanumeric only?
If I can't do that, I'll have to treat them differently.
Also, isn't newline supposed to be used with regex search? It seems like a very specific case.
I think it would make sense to treat all whitespace equally. At least on paper, there are several possible methods to disperse text. (Different line spacing, different line lengths...) So it is difficult to predict whether I should/shouldn’t put a \n into the search box.
I think it would make sense to treat all whitespace equally. At least on paper, there are several possible methods to disperse text. (Different line spacing, different line lengths...) So it is difficult to predict whether I should/shouldn’t put a \n into the search box.
It should work now, can you test it?
Me? Don’t have a wall plug here, so can’t compile now.
Additionally, I didn’t really look at your code. But how about QString::simplified() instead of QString::replace(QRegExp)?
I don't think this is the way to go.
If i took the work to figure out how to type a newline in the search line i want a newline not a space to be what is searched for.
Hmmmmm. I can think of checking if the '\n' is in the search bar.
What if we limit the '\n' search to case sensitive? That can be a way, only search what is typed.
Right now I don't see exactly how we would do it, because there's no way we can identify which '\n' is typed by the user, and what '\n' is on the PDF side.
In my honest opinion we shouldn't support for '\n' search, unless it's something like regex search, but I see you disagree.
I'm not saying to support \n that is not a character, i'm speaking of the actual newline character, which already works and breaks with this patch. I think it's not great that we modify what the user choses to search, but on the other hand i see were already running normalization on it so what do i know 🤷
BTW QRegexp is "old", should be using QRegularExpression instead ideally
Please move as a Merge Request in https://invent.kde.org/kde/okular
We have pre-commit CI and lots of checks including clazy and clang-tidy there so it's a much better place for doing the review/approval/merge of the code.