Add semantic data extraction plugin
ClosedPublic

Authored by vkrause on Oct 31 2017, 5:34 PM.

Details

Summary

This plugins looks for structured data about the email content inside HTML
mail parts based on the schema.org ontology, as can be found in e.g.
airline and hotel booking confirmation emails.

The only thing this does with that information for now is showing a simple
summary of a flight itinerary. That's already useful, so you don't have
to look through the usual several pages of poorly rendered HTML content.
Allowing to add booking details to your calendar is an obvious next step.

Seeing how many workarounds were needed to parse the real-world mails I
have here, I suspect this will need more adjustments, so please send me
test material :)

See also https://developers.google.com/gmail/markup/

Diff Detail

Repository
R81 KDE PIM Addons
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
vkrause created this revision.Oct 31 2017, 5:34 PM
Restricted Application added a project: KDE PIM. · View Herald TranscriptOct 31 2017, 5:34 PM
Restricted Application added a subscriber: KDE PIM. · View Herald Transcript
mlaurent added inline comments.
plugins/messageviewer/bodypartformatter/autotests/structureddataextractortest.cpp
41

Perhaps signal that file doesn't exist here. For debug

plugins/messageviewer/bodypartformatter/semantic/CMakeLists.txt
11

SEMANTIC_LOG is better that LOG :)

plugins/messageviewer/bodypartformatter/semantic/datatypes.h
34

Perhaps we can set CONSTANT no ?

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
92

coding style space before & not after

plugins/messageviewer/bodypartformatter/semantic/semanticmemento.h
36

mData ? as variable of class

plugins/messageviewer/bodypartformatter/semantic/structureddataextractor.cpp
31

coding style "QString &"

34
if (m_data.isEmpty()) {
    findLdJson(...);
    if (m_data.isEmpty()) {
        parse....
    }
}
40

coding style QString &

160

cache reader.name() as QStringRef readerName = reader.name();

mlaurent requested changes to this revision.Nov 1 2017, 6:45 AM

and update kdepim-addons.categories with your new category

This revision now requires changes to proceed.Nov 1 2017, 6:45 AM
vkrause updated this revision to Diff 21731.Nov 1 2017, 8:51 PM

addressed review comments, added basic support for hotel booking confirmations

plugins/messageviewer/bodypartformatter/semantic/datatypes.h
34

Doesn't seem to work, this seems to disable the code needed for setProperty().

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
92

Eventually I need to fix KDevelop to insert this correctly for new methods...

mlaurent added inline comments.Nov 1 2017, 8:57 PM
plugins/messageviewer/bodypartformatter/semantic/datatypes.h
34

ok (wierd problem in qt ?) but ok :)

plugins/messageviewer/bodypartformatter/semantic/jsonlddocument.cpp
92

yep :)

mlaurent accepted this revision.Nov 1 2017, 9:02 PM

Seems ok for me now :)

This revision is now accepted and ready to land.Nov 1 2017, 9:02 PM

No commited ?:)

No commited ?:)

Sorry, I had an unplanned trip interfering with finishing this (and no, not just to collect more test data ;-) ). I'd still like to fix date/time displaying with Grantlee, I seem to only get ISO formatting atm, which isn't exactly nice to read.

vkrause updated this revision to Diff 21866.Nov 4 2017, 2:32 PM

Localize/format flight and checkin/checkout times.

Would be better if this moves from the data model to the presentation layer (ie. Grantlee), but I don't see how to control date formatting there yet.

This revision was automatically updated to reflect the committed changes.