Relax TextPFRun validation to allow LibreOffice PPT import
ClosedPublic

Authored by davidllewellynjones on Nov 11 2019, 12:54 PM.

Details

Summary

This change allows non-conforming LibreOffice PPT files to be imported.

An apparent bug in the LibreOffice PPT exporter makes it output files
which technically don't conform to the PPT specification. Calligra
refuses to load these files, which although technically may be the
correct behaviour, is extremely annoying for the user. LibreOffice's
deviation from the PPT spec is pretty minor, and a slight weakening of
Calligra's validation allows the files to be imported successfully.

In more detail, when loading a drawing each text paragraph in the
drawing has a TextPFRun structure ("A structure that specifies the
paragraph-level formatting of a run of text"). This starts with a mask,
followed by a sequence of fields. Only unmasked fields are included in
the sequence.

According to Section 2.9.45 of the PPT specification version 6 [1], the
following fields must be masked out:

masks.leftMargin
masks.indent
masks.defaultTabSize
masks.tabStops

In spite of this LibreOffice includes the leftMargin and indent fields
(flags 0x100 and 0x400). I'm not familiar with the LibreOffice codebase,
but it looks like this is the problem code. From this same code it look
like LibreOffice doesn't export the defaultTabSize or tabStops fields
(which is correct).

This patch loosens Calligra's validation to allow these flags to be set.
It applies the change in binschema [2] for the same reason used to
generate the Calligra parser files (see binschma commit
aca4fd06f1ad330ecadf05b9e862d7c91338f051).

[1] https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-ppt
[2] https://phabricator.kde.org/source/binschema/

Test Plan
  1. Save out a file from LibreOffice in PPT format, or download this archive with a test file inside.
  2. Attempt to load the file into Calligra Stage.
  3. Note that it refuses to load with the error "Invalid file format".
  4. Apply the patch.
  5. Attempt to load the same file again.
  6. Note that it loads correctly. If you used my test file, witness my amazing presentation design.

Diff Detail

Repository
R8 Calligra
Branch
allow-libreoffice-ppt-validation (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18976
Build 18994: arc lint + arc unit
Restricted Application added a subscriber: Calligra-Devel-list. · View Herald TranscriptNov 11 2019, 12:54 PM
davidllewellynjones requested review of this revision.Nov 11 2019, 12:54 PM
davidllewellynjones added a comment.EditedNov 11 2019, 1:10 PM

Here's the corresponding patch for binschema: https://phabricator.kde.org/D25259

Seems reasonable to me.

Wonder what to do with that .jar. So phabricator didn't allow to be included in this PR? Manually include when merging?

filters/libmso/generated/simpleParser.h
2

dirty sounds like git without commit, though maybe doesn't matter. Didn't find the earlier sha either, maybe was some local commit.

davidllewellynjones retitled this revision from Allow non-conforming LibreOffice PPT files to be imported to Relax TextPFRun validation to allow LibreOffice PPT import.
davidllewellynjones edited the summary of this revision. (Show Details)
davidllewellynjones edited the test plan for this revision. (Show Details)
  • SVN_SILENT made messages (.desktop file) - always resolve ours
  • SVN_SILENT made messages (.desktop file) - always resolve ours
  • SVN_SILENT made messages (.desktop file) - always resolve ours
  • Relax TextPFRun validation to allow LibreOffice PPT import
  1. Updating D25256: Relax TextPFRun validation to allow LibreOffice PPT import #
  2. Enter a brief description of the changes included in this update.
  3. The first line is used as subject, next lines as comment. #
  4. If you intended to create a new revision, use:
  5. $ arc diff --create

Align with binschema changes

This updates the diff to align with the changes accepted to binschema updating mso.xml. These changes are then used to generate the files here. This change also now includes the mso.jar binary file.

  1. Updating D25256: Relax TextPFRun validation to allow LibreOffice PPT import #
  2. Enter a brief description of the changes included in this update.
  3. The first line is used as subject, next lines as comment. #
  4. If you intended to create a new revision, use:
  5. $ arc diff --create

Remove previous commits to master that got mixed in accidentally.

davidllewellynjones edited the summary of this revision. (Show Details)
  1. Updating D25256: Relax TextPFRun validation to allow LibreOffice PPT import #
  2. Enter a brief description of the changes included in this update.
  3. The first line is used as subject, next lines as comment. #
  4. If you intended to create a new revision, use:
  5. $ arc diff --create

Remove excess spaces from commit message.

pvuorela accepted this revision.Nov 20 2019, 9:45 AM
This revision is now accepted and ready to land.Nov 20 2019, 9:45 AM
This revision was automatically updated to reflect the committed changes.