All users of TermGenerator::termList do not use QStrings but UTF-8
ByteArrays. Remove repetitive code.
While at it, make the lists const to avoid detach on iteration.
ngraham | |
astippich | |
poboiko |
Baloo |
All users of TermGenerator::termList do not use QStrings but UTF-8
ByteArrays. Remove repetitive code.
While at it, make the lists const to avoid detach on iteration.
ctest
Automatic diff as part of commit; lint not applicable. |
Automatic diff as part of commit; unit tests not applicable. |
Actually, there is an issue with that code right now, which I wanted to fix, but forgot.
The trimming part finalArr = finalArr.mid(0, maxTermSize); actually should be performed on QString instead of QByteArray - unicode symbols inside term can consist of two bytes, and cutting at maxTermSize bytes can actually cut half of last symbol. I end up with terms like тождественно� inside balooshow -x.
Not to mention that russian terms end up being pretty small.
As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.
As the limit is somewhat arbitrary, maybe we can just limit the QString? I don't think this has any serious side effects.
Yep, that's what I've suggested (if I understood you correctly).
I guess, we can put the trimming part right to termList(), it will further reduce code repetition. Something like
str = str.left(maxTermSize); list << str.toUtf8();