Add semantic data extraction plugin
ClosedPublic

Authored by vkrause on Oct 31 2017, 5:34 PM.

Details

Summary

This plugins looks for structured data about the email content inside HTML
mail parts based on the schema.org ontology, as can be found in e.g.
airline and hotel booking confirmation emails.

The only thing this does with that information for now is showing a simple
summary of a flight itinerary. That's already useful, so you don't have
to look through the usual several pages of poorly rendered HTML content.
Allowing to add booking details to your calendar is an obvious next step.

Seeing how many workarounds were needed to parse the real-world mails I
have here, I suspect this will need more adjustments, so please send me
test material :)

See also https://developers.google.com/gmail/markup/

Diff Detail

Repository
R81 KDE PIM Addons
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
vkrause created this revision.Oct 31 2017, 5:34 PM
Restricted Application added a project: KDE PIM. · View Herald TranscriptOct 31 2017, 5:34 PM
Restricted Application added a subscriber: KDE PIM. · View Herald Transcript
mlaurent added inline comments.
plugins/messageviewer/bodypartformatter/autotests/structureddataextractortest.cpp
42

Perhaps signal that file doesn't exist here. For debug

plugins/messageviewer/bodypartformatter/semantic/CMakeLists.txt
12

SEMANTIC_LOG is better that LOG :)

plugins/messageviewer/bodypartformatter/semantic/datatypes.h
35

Perhaps we can set CONSTANT no ?

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
93

coding style space before & not after

plugins/messageviewer/bodypartformatter/semantic/semanticmemento.h
37

mData ? as variable of class

plugins/messageviewer/bodypartformatter/semantic/structureddataextractor.cpp
32

coding style "QString &"

35
if (m_data.isEmpty()) {
    findLdJson(...);
    if (m_data.isEmpty()) {
        parse....
    }
}
41

coding style QString &

161

cache reader.name() as QStringRef readerName = reader.name();

mlaurent requested changes to this revision.Nov 1 2017, 6:45 AM

and update kdepim-addons.categories with your new category

This revision now requires changes to proceed.Nov 1 2017, 6:45 AM
vkrause updated this revision to Diff 21731.Nov 1 2017, 8:51 PM

addressed review comments, added basic support for hotel booking confirmations

plugins/messageviewer/bodypartformatter/semantic/datatypes.h
35

Doesn't seem to work, this seems to disable the code needed for setProperty().

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
93

Eventually I need to fix KDevelop to insert this correctly for new methods...

mlaurent added inline comments.Nov 1 2017, 8:57 PM
plugins/messageviewer/bodypartformatter/semantic/datatypes.h
35

ok (wierd problem in qt ?) but ok :)

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
93

yep :)

mlaurent accepted this revision.Nov 1 2017, 9:02 PM

Seems ok for me now :)

This revision is now accepted and ready to land.Nov 1 2017, 9:02 PM

No commited ?:)

No commited ?:)

Sorry, I had an unplanned trip interfering with finishing this (and no, not just to collect more test data ;-) ). I'd still like to fix date/time displaying with Grantlee, I seem to only get ISO formatting atm, which isn't exactly nice to read.

vkrause updated this revision to Diff 21866.Nov 4 2017, 2:32 PM

Localize/format flight and checkin/checkout times.

Would be better if this moves from the data model to the presentation layer (ie. Grantlee), but I don't see how to control date formatting there yet.

This revision was automatically updated to reflect the committed changes.