Fix multiple train numbers in DB extractor
AbandonedPublic

Authored by nicolasfella on Sep 9 2018, 9:19 PM.

Details

Reviewers
vkrause
Summary

Some trains have double train numbers, e.g. ICE 234, ICE 136. In the PDF I have the second number is on a new line (with the arrival station), so I check that line and append the result to the number if found. This broke
things for international tickets, so only do it for domestic ones.

Test Plan

Tested with domestic one (with double train number) and two international one (without double train number). Needs some more thorough testing with more PDFs

Diff Detail

Repository
R1003 KItinerary: Travel Reservation handling library
Branch
multitrainnumber
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2660
Build 2678: arc lint + arc unit
nicolasfella created this revision.Sep 9 2018, 9:19 PM
Restricted Application added a project: KDE PIM. · View Herald TranscriptSep 9 2018, 9:19 PM
Restricted Application added a subscriber: kde-pim. · View Herald Transcript
nicolasfella requested review of this revision.Sep 9 2018, 9:19 PM
nicolasfella edited the test plan for this revision. (Show Details)Sep 9 2018, 9:26 PM

Thanks for looking into this!
This seems to break the unit tests (unstructureddataextractortest) unfortunately, as well as the tests on my ticket collection (with similar symptoms).

I wonder if we can assume that the second train number has the same type, i.e. it will always be "ICE 123, ICE 234" and not "ICE 123, FOO 234". Or any other assumption what a valid train number looks like. Then we could match for that in the second line and avoid getting stuff in that isn't a train number

Good question. For the "Flügelzug" configuration in your test case it's always the same type I think. However, I'm not sure if the train-equivalent of "code shares" exists, e.g. on international ICE/TGV/Thalys/etc services.
The lesson I learned on trying to find patterns for train numbers (or platforms/station names) so far is: don't, there's always some corner case breaking this ;-)

nicolasfella planned changes to this revision.Sep 27 2018, 6:28 PM
nicolasfella abandoned this revision.May 18 2020, 4:01 PM