Discard duplicate results during contact completion
ClosedPublic

Authored by dvratil on Apr 6 2020, 9:36 AM.

Details

Summary

Drop duplicate results from contact completion to return
more relevant results. This is still limited by the indexing
side as we are unable to deduplicate easily based on the email
address itself (or merge the results in some clever way).

Diff Detail

Repository
R42 Akonadi Search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
dvratil created this revision.Apr 6 2020, 9:36 AM
Restricted Application added a project: KDE PIM. · View Herald TranscriptApr 6 2020, 9:36 AM
Restricted Application added a subscriber: kde-pim. · View Herald Transcript
dvratil requested review of this revision.Apr 6 2020, 9:36 AM
dvratil updated this revision to Diff 79460.Apr 6 2020, 9:48 AM
  • Fix includes
dfaure accepted this revision.Apr 6 2020, 10:22 AM

Works great. Given the number of matches is limited by m_limit, this actually returns more useful contacts than before, to the user it's not just about deduplication (libkdepim does deduplicate on top anyway).

One improvement would be to prefer matches with full name over matches without name.

I type "vkrau" and it says:
12:19:15.163 kmail2(16923/16923) org.kde.pim.akonadi_search_pim: processEnquire Match: "vkrause@kde.org" (50%), docid 7318456
12:19:15.163 kmail2(16923/16923) org.kde.pim.akonadi_search_pim: processEnquire Skipped duplicate match "vkrause@kde.org" (50%) docid 13800517
12:19:15.163 kmail2(16923/16923) org.kde.pim.akonadi_search_pim: processEnquire Match: "Volker Krause <vkrause@kde.org>" (47%), docid 1292769881
12:19:15.163 kmail2(16923/16923) org.kde.pim.akonadi_search_pim: processEnquire Skipped duplicate match "Volker Krause <vkrause@kde.org>" (47%) docid 3129445885
and I end up with just vkrause@kde.org in the completion, no name. Ah but this code returns both matches, it's libkdepim which deduplicates on top, and wrongly.

So indeed the question is whether this code should do full deduplication (like your TODO says), or if that part is for libkdepim (which should then be improved).

This also makes me wonder if the limit here is too low. I never realized I wasn't getting all matches but just a subset.

Thanks!

This revision is now accepted and ready to land.Apr 6 2020, 10:22 AM

I think any filtering/deduplication should happen in Akonadi Search here - since we are able to store structured data (e.g. split the name and the address into two different fields), Xapian can perform clever deduplication at query time, rather than client code (libkdepim) having to do expensive address parsing for each result.

We can even return the data structured, like a tuple (name, address, relevance) to make it easier for client code to aggregate the results.

Also the code should be made asynchronous so we can query much more results and leave it up to the client to drop what they don't need.

This revision was automatically updated to reflect the committed changes.

OK if I backport this to release/20.04?