[TermGenerator] Use UTF-8 ByteArray for termList
ClosedPublic

Authored by bruns on Jun 16 2019, 12:29 AM.

Details

Summary

All users of TermGenerator::termList do not use QStrings but UTF-8
ByteArrays. Remove repetitive code.

While at it, make the lists const to avoid detach on iteration.

Test Plan

ctest

Diff Detail

Repository
R293 Baloo
Branch
cleanup
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 12875
Build 12893: arc lint + arc unit
bruns created this revision.Jun 16 2019, 12:29 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptJun 16 2019, 12:29 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Jun 16 2019, 12:29 AM
ngraham accepted this revision.Jun 16 2019, 5:12 PM
This revision is now accepted and ready to land.Jun 16 2019, 5:12 PM
This revision was automatically updated to reflect the committed changes.
poboiko added a comment.EditedJun 16 2019, 5:13 PM

Actually, there is an issue with that code right now, which I wanted to fix, but forgot.
The trimming part finalArr = finalArr.mid(0, maxTermSize); actually should be performed on QString instead of QByteArray - unicode symbols inside term can consist of two bytes, and cutting at maxTermSize bytes can actually cut half of last symbol. I end up with terms like тождественно� inside balooshow -x.
Not to mention that russian terms end up being pretty small.

bruns added a comment.Jun 16 2019, 5:19 PM

Actually, there is an issue with that code right now, which I wanted to fix, but forgot.
The trimming part finalArr = finalArr.mid(0, maxTermSize); actually should be performed on QString instead of QByteArray - unicode symbols inside term can consist of two bytes, and cutting at maxTermSize bytes can actually cut half of last symbol. I end up with terms like тождественно� inside balooshow -x.
Not to mention that russian terms end up being pretty small.

As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.

As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.

Yep, that's what I've suggested (if I understood you correctly).
I guess, we can put the trimming part right to termList(), it will further reduce code repetition. Something like

str = str.left(maxTermSize); 
list << str.toUtf8();

As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.

Yep, that's what I've suggested (if I understood you correctly).
I guess, we can put the trimming part right to termList(), it will further reduce code repetition. Something like

str = str.left(maxTermSize); 
list << str.toUtf8();

See D21865