Fix decoding of strings with wingdings/symbol characters in excel TxO records.
Needs ReviewPublic

Authored by denexter on Oct 29 2019, 6:38 AM.

Details

Summary

These will have the high bit set which will make them appear to be non-printable resulting in the entire string being discarded. The style information will have the necessary font information so it's sufficient to just strip the bit.

Diff Detail

Repository
R8 Calligra
Lint
Lint Skipped
Unit
Unit Tests Skipped
denexter created this revision.Oct 29 2019, 6:38 AM
Restricted Application added a project: Calligra: 3.0. · View Herald TranscriptOct 29 2019, 6:38 AM
Restricted Application added a subscriber: Calligra-Devel-list. · View Herald Transcript
denexter requested review of this revision.Oct 29 2019, 6:38 AM

Is there something in the spec encoding the microsoft fonts or just some kind of practice? Seems a little scary just stripping bits.

If the main issue is the whole string disappearing, those text.clear() calls below looks suspicious. Unicode string could contain all kinds of non-printing characters like LTR/RTL controls which would remain broken? Seems to be introduced by commit 4847181d7d5f for some kind of workaround. No idea if still needed or not.

There's nothing in the spec that I've found, there's a more detailed explanation in the word parser https://cgit.kde.org/calligra.git/tree/filters/words/msword-odf/wv2/src/parser9x.cpp#n513 but it still doesn't cite any sources. Removing the entire string is excessive and may be a problem with some documents, but removing that without addressing the decoding issue gives you a string with junk or missing characters whereas addressing the decoding gives the full correct string.

But would those characters work without encoding adjustments if the used MS font was present? Not sure how it gets rendered now without the font, but if it's anything like the "J" exchange email smiley I'm not sure which is worse.

Adding Marijn. Ancient changes, but could those text.clear() parts be removed by now?

The necessary font information is supplied independently, so if you have the font installed it does show the smiley.