Avoid crash when reading corrupt data from document terms db
ClosedPublic

Authored by bruns on Apr 8 2018, 2:48 PM.

Details

Summary

The terms db contains terms, where each terms is stored independently
(terminated with 0), or as a suffix to the previous term (terminated with
1).
In case of corrupted data, the first terminator seen may be a 1, which
leads to a crash when trying to access the previous term with
QVector<>::last().
Show a debug message, to give a hint about the bad data, which can be
fixed by reindexing the relevant file.

BUG: 392878
CCBUG: 392877

Test Plan

Corrupt the database
Run balooshow -x <affected file(s)>

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Apr 8 2018, 2:48 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptApr 8 2018, 2:48 PM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
bruns requested review of this revision.Apr 8 2018, 2:48 PM

Corrupt the database

As described in BUG: 392877?

bruns added a comment.Apr 18 2018, 2:27 AM

Corrupt the database

As described in BUG: 392877?

yes.

bruns added a comment.May 15 2018, 2:38 PM

Kind request to review ...

Restricted Application added a subscriber: kde-frameworks-devel. · View Herald TranscriptMay 15 2018, 2:38 PM

If there is noone willing to review, I will push this tomorrow

dhaumann accepted this revision.May 29 2018, 2:28 AM
dhaumann added a subscriber: dhaumann.

If the format is really such that a term must appear before any Suffix, then this patch is already better that before.

Could it happen to have e.g.: a\0b1c1

If so, this code would extend the Suffix b with Suffix c. Would that be correct? Or can that never happen? Or should c be a Suffix for a? If so, this code should be improved.

This revision is now accepted and ready to land.May 29 2018, 2:28 AM

If the format is really such that a term must appear before any Suffix, then this patch is already better that before.

Could it happen to have e.g.: a\0b1c1

If so, this code would extend the Suffix b with Suffix c. Would that be correct? Or can that never happen? Or should c be a Suffix for a? If so, this code should be improved.

a\x00b\x01c\x01 would be decoded as "a", "ab", "abc".

"the", "their", "theirs", "there" is encoded as "the\x00ir\x01s\x01there\x00".

Ok, then please commit.

dhaumann added inline comments.May 29 2018, 12:46 PM
src/engine/documentdb.cpp
101

Ah, maybe this should be a qWarning()? Feel free to decide as you wish.

This revision was automatically updated to reflect the committed changes.