Fix decoding of strings with wingdings/symbol characters in excel TxO records.
Needs ReviewPublic
Actions

Authored by denexter on Oct 29 2019, 6:38 AM.

Details

Reviewers

pvuorela
mkruisselbrink

Summary

These will have the high bit set which will make them appear to be non-printable resulting in the entire string being discarded. The style information will have the necessary font information so it's sufficient to just strip the bit.

Diff Detail

Repository

R8 Calligra

Lint

Lint Skipped

Unit

Unit Tests Skipped

denexter created this revision.Oct 29 2019, 6:38 AM

Restricted Application added a project: Calligra: 3.0. · View Herald TranscriptOct 29 2019, 6:38 AM

Restricted Application added a subscriber: Calligra-Devel-list. · View Herald Transcript

denexter requested review of this revision.Oct 29 2019, 6:38 AM

Is there something in the spec encoding the microsoft fonts or just some kind of practice? Seems a little scary just stripping bits.

If the main issue is the whole string disappearing, those text.clear() calls below looks suspicious. Unicode string could contain all kinds of non-printing characters like LTR/RTL controls which would remain broken? Seems to be introduced by commit 4847181d7d5f for some kind of workaround. No idea if still needed or not.

There's nothing in the spec that I've found, there's a more detailed explanation in the word parser https://cgit.kde.org/calligra.git/tree/filters/words/msword-odf/wv2/src/parser9x.cpp#n513 but it still doesn't cite any sources. Removing the entire string is excessive and may be a problem with some documents, but removing that without addressing the decoding issue gives you a string with junk or missing characters whereas addressing the decoding gives the full correct string.

But would those characters work without encoding adjustments if the used MS font was present? Not sure how it gets rendered now without the font, but if it's anything like the "J" exchange email smiley I'm not sure which is worse.

Adding Marijn. Ancient changes, but could those text.clear() parts be removed by now?

The necessary font information is supplied independently, so if you have the font installed it does show the smiley.

ping

Revision Contents
Changeset List

			Path	Packages
M			filters/sheets/excel/sidewinder/excel.cpp (3 lines)

Diff 68944

View Options

filters/sheets/excel/sidewinder/excel.cpp

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	68944			Oct 29 2019, 6:36 AM	★	★