Don't trigger text-based extractors if we have a PDF alternative
The text-based alternatives only exist for DB and SNCF so we can check in
unit test data, so change the trigger filter to something special for the
tests. As a result we only run the PDF variant on real data, avoiding the
extra work and any possible merging issues.