MIME type correction for .ods files in src/gdrivehelper.cpp
ClosedPublic

Authored by martijnschmidt on Jan 7 2018, 12:14 PM.

Details

Summary

KIO-GDrive appears to assign the mime type "application/x-vnd.oasis.opendocument.spreadsheet" to .ods files in src/gdrivehelper.cpp, however, the correct mime type is "application/vnd.oasis.opendocument.spreadsheet".

The patch which I have submitted adjusts the mime type. It also fixes a typo in the variable VND_OASIS_OPENDOCUMENT_SPREADSHEED which should actually be spelled as VND_OASIS_OPENDOCUMENT_SPREADSHEET ... if changes are made anyway, we might as well tackle both at the same time. :-)

References:
https://bugs.kde.org/show_bug.cgi?id=388598
https://wiki.documentfoundation.org/Faq/General/036
https://en.wikipedia.org/wiki/OpenDocument_technical_specification#Documents

BUG: 388598

Test Plan
  1. Try to open a .ods file through an unpatched KIO-GDrive and notice how you'll receive a Choose Application prompt from Dolphin. This is a result of the fact that "application/x-vnd.oasis.opendocument.spreadsheet" is an unknown MIME type.
  1. Repeat the test described in 1 with the patched KIO-GDrive, and notice how (for example) LibreOffice Calc immediately opens the file without going through Dolphin's Choose Application prompt first.

Diff Detail

Repository
R219 KIO GDrive
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
martijnschmidt requested review of this revision.Jan 7 2018, 12:14 PM
martijnschmidt created this revision.
ltoscano added a subscriber: ltoscano.

@ngraham Frameworks is the group for the "Frameworks" product, but kio-gdrive is not part of it. Also, all repositories part of Frameworks already get Frameworks as subscriber, so apart from exceptional cases, adding Frameworks to a review should almost never be needed.

Oh OK, sorry. I'll remember that for the future.

Thanks for the patch, but I cannot apply it right now. How did you create the diff? Please see https://community.kde.org/Infrastructure/Phabricator#Posting_a_Patch

Hi Elvis, I ran a diff -u a/src/gdrivehelper.ccp b/src/gdrivehelper.cpp to create this output.

Hi Elvis, I ran a diff -u a/src/gdrivehelper.ccp b/src/gdrivehelper.cpp to create this output.

git diff from the kio-gdrive repo should be enough

dvratil added a subscriber: dvratil.Jan 8 2018, 7:43 AM

The application/x-vnd.oasis.opendocument.spreadsheet was used intentionally since that's what GDrive returns in the exportLinks for the Google Documents spreadsheets:

"exportLinks": {
    "application/x-vnd.oasis.opendocument.spreadsheet": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=ods",
    "text/tab-separated-values": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=tsv",
    "application/pdf": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=pdf",
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=xlsx",
    "text/csv": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=csv",
    "application/zip": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=zip",
    "application/vnd.oasis.opendocument.spreadsheet": "https://docs.google.com/spreadsheets/export?id=....&exportFormat=ods"
   },

Apparently, they also return the proper mimetype now, but that wasn't always the case. Considering Google being Google, I'd maybe keep around support for both versions...

Re-export the diff with "git diff" instead of "diff -u".

Hi Elvis, I ran a diff -u a/src/gdrivehelper.ccp b/src/gdrivehelper.cpp to create this output.

git diff from the kio-gdrive repo should be enough

I have updated the diff after generating it with a different method - does this one apply correctly?

The application/x-vnd.oasis.opendocument.spreadsheet was used intentionally since that's what GDrive returns in the exportLinks for the Google Documents spreadsheets:
Apparently, they also return the proper mimetype now, but that wasn't always the case. Considering Google being Google, I'd maybe keep around support for both versions...

The API exportLinks output still has the wrong MIME type using Google's API Explorer. So does the manual on https://developers.google.com/drive/v2/web/manage-downloads#downloading_google_documents and the same goes for the Drive API v3. As you can see in the Google link both other OpenDocument formats don't have the x-vnd, they only use vnd.

If I interpret the code correctly, the only reason we are overriding the default MIME type is because Google Docs starts to mess around with "supported" files as soon they're uploaded to Google Drive. If I upload a file type that is not recognized by Google Docs, such as a LibreOffice Draw file with the .odg extension and MIME type application/vnd.oasis.opendocument.graphics, the file is correctly opened by LibreOffice Draw even though that MIME type isn't listed in the gdrivehelper code.

I guess we shouldn't keep around a broken MIME type, or am I missing something obvious?

For documents created in Google Docs, GDrive API will list them as GDocs documents, which nobody else but GDocs can open. It, however, provides a list of mimetypes into which GDocs can convert the document for download. The code here is used so that we can show GDocs documents as .odt/.ods/.pdf/.... documents that users can click and open in their computer.

As such, the mappings here are used to map from the mimetypes in externalLinks to some usable mimetype. Since Google still documents the "broken" mimetype as used, we should keep it supported, alongside the correct one.

For documents created in Google Docs, GDrive API will list them as GDocs documents, which nobody else but GDocs can open. It, however, provides a list of mimetypes into which GDocs can convert the document for download. The code here is used so that we can show GDocs documents as .odt/.ods/.pdf/.... documents that users can click and open in their computer.

As such, the mappings here are used to map from the mimetypes in externalLinks to some usable mimetype. Since Google still documents the "broken" mimetype as used, we should keep it supported, alongside the correct one.

I don't think one can actually keep both MIME types supported within a single file on Linux? The only implementation which I could find that permits multiple MIME types within a single file is Android: https://developer.android.com/reference/android/content/Intent.html#EXTRA_MIME_TYPES

Google is quite simply wrong to assign the x- style MIME type and I've just submitted some feedback reports through the website asking them to help update these portions of the documentation (plus externalLinks output for the Google Drive v2 API). I understand that the file itself retains the GDocs MIME type "in the cloud", but when exporting the file through externalLinks the Google API recommends some MIME types that we could choose to apply. However, KIO-GDrive ignores this portion of the Google API output so even if Google changes the MIME type recommendations within externalLinks it shouldn't result in any harm for KIO-GDrive users: the correct MIME type would be applied based on the .ods file extension. The final authorities on MIME types are the IETF and IANA, and both seem to agree that Google is in the wrong here:

Type and subtype names beginning with "X-" are reserved for experimental use and MUST NOT be registered. from https://tools.ietf.org/html/rfc4288#section-4.2

MIME type application/vnd.oasis.opendocument.spreadsheet seems to have been registered with IANA back in 2006 and last updated in 2009, whereas Google Drive was first launched in 2012: https://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.spreadsheet

Sorry, I wasn't very clear explaining myself above.

I would say let's add a new #define:

#define X_VND_OASIS_OPENDOCUMENT_SPREADSHEET QStringLiteral("application/x-vnd.oasis.opendocument.spreadsheet")

Add it to GDriveHelper::ExtensionMap to map to the ".ods" extension alongside the correct VND_OASIS_OPENDOCUMENT_SPREADSHEET mimetype.
Add it to GDriveHelper::ConversionMap as a possible value to the VND_GOOGLE_APPS_SPREADSHEET alongside the correct VND_OASIS_OPENDOCUMENT_SPREADSHEET mimetype.
In the loop in GDriveHelper::convertFromGDocs() we would then do a fixup:

...
if (targetMimeType == X_VND_OASIS_OPENDOCUMENT_SPREADSHEET) {
    file->setMimeType(VND_OASIS_OPENDOCUMENT_SPREADSHEET);
} else {
    file->setMimeType(targetMimeType);
}
....

This way the code will correctly handle both the incorrect and correct mimetypes in the externalLinks map (whichever it runs into first) and will correctly convert the document to the right extension and the correct mimetype.

@martijnschmidt Any update on this? I think @dvratil's suggestion is a safe choice, so I'd also vote for that.

Hey @elvisangelaccio, please accept my apologies for the delay - I needed some time to understand the code due to my sheer lack of experience, since I'd like to to grasp why a certain change is being recommended rather than blindly implementing it. After reading through the relevant chunk of code again I now understand and agree with @dvratil's recommended change, but I do have two additional questions.

Firstly, GDriveHelper::convertFromGDocs() is going to take the mimeType() property of the file and use it to match a key in GDriveHelper::ConversionMap which then stores a list of valid export MIME types for the GDocs format it is dealing with in the convIt variable. If it didn't match any key it'll return the downloadUrl() property of the file, just like GDriveHelper::isGDocsDocument() would've returned false. Because a non-GDocs file doesn't show exportLinks when listed through the API this function would've returned the downloadUrl() in the end anyway, so maybe we can safely get rid of the if (convIt == GDriveHelper::ConversionMap.cend()) clause?

Secondly, after reading through it all I now understand that my initially submitted patch works because the Drive API returns both the documented but invalid x-vnd.oasis and the undocumented but valid vnd.oasis MIME type in the exportLinks using an identical export?id=fileId&exportFormat=ods command:

"exportLinks": {
 "application/x-vnd.oasis.opendocument.spreadsheet": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=ods",
 "text/tab-separated-values": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=tsv",
 "application/pdf": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=pdf",
 "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=xlsx",
 "text/csv": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=csv",
 "application/zip": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=zip",
 "application/vnd.oasis.opendocument.spreadsheet": "https://docs.google.com/spreadsheets/export?id=fileId&exportFormat=ods"
},

If Google were to stop appending that last undocumented entry to the exportLinks we would still be able to convert the file, because the code would match the application/vnd.openxmlformats-officedocument.spreadsheetml.sheet MIME type on the second iteration of the Q_FOREACH loop and export a .xslx file, or failing that the application/pdf MIME type on the third iteration of the Q_FOREACH loop and export a .pdf file. Of course, being an open source project, we prefer the .ods file type over the others hence @dvratil's recommendation to safeguard against Google removing the undocumented exportLinks entry. But that does make me wonder, why go through all the trouble of listing multiple MIME types in the GDriveHelper::ConversionMap when we're only ever going to be using the first one from the list anyway? The code would be a lot simpler if we just mapped each Google Document to the .odt format, each Google Drawing to the .png format, each Google Presentation to the .odp format, and each Google Spreadsheet to the .ods format. We can even guess the exportLinks for the desired formats when the fileId and GDocs mimeType are already known.

While technically all your observations are correct, when dealing with data coming from a proprietary service where we cannot do any assumptions about their implementation, it's better to write the code defensively and not just assume that the way things happen to work now is set to stone. Our only guideline is the API documentation. Any other observations about the data I prefer to treat as a coincidence that happens to work for me and the few documents on my GDrive, but may break in the real world.

The if (convIt == GDriveHelper::ConversionMap.cend()) is simply a good practice: *always* check your iterators before using them. In the future, we might call convertFromGDocs() from elsewhere and forget to use the isGDocsDocument() check there, which without the iterator guard would lead to a crash, so it's better if the code does the check itself, especially since the check has very little cost.

Regarding the whole thing, this code can handle (almost) anything that Google throws at it: maybe there's a file that GDrive cannot convert to ODT and it will not offer it in the exportLinks, or maybe in the future they will introduce a feature in GDocs that will allow you to restrict which formats the document can be converted to, or maybe they will just decide to stop supporting OpenDocument formats completely one day etc. Google usually doesn't tell anyone about these changes in advance, so it's important that the code can deal with the unexpected without substantially hindering the user experience.

Implement @dvratil's recommended code changes per D9706#188708 and document the reasoning behind the extra safeguard as comments in the code.

While technically all your observations are correct, when dealing with data coming from a proprietary service where we cannot do any assumptions about their implementation, it's better to write the code defensively and not just assume that the way things happen to work now is set to stone. Our only guideline is the API documentation. Any other observations about the data I prefer to treat as a coincidence that happens to work for me and the few documents on my GDrive, but may break in the real world.

Thanks for the clarification, I really appreciate that. I've uploaded a new diff which implements your recommended modifications and also documents the reasoning behind them as comments in the code for future reference.

elvisangelaccio accepted this revision.Feb 10 2018, 4:31 PM
This revision is now accepted and ready to land.Feb 10 2018, 4:31 PM
This revision was automatically updated to reflect the committed changes.