[Extractor] Make extractor crash resilient
ClosedPublic

Authored by bruns on Oct 16 2018, 10:47 PM.

Details

Summary

Connect to QProcess::finished to detect the exit status. In case the
process has crashed, signal the indexer.

On a crash, restart the process and feed it a smaller batch. If the
crashing batch contains only a single file, mark the file as failed, i.e.
add it to the "failedid" db and remove it from the content indexing db
to avoid further indexing attempts.

CCBUG: 375131

Test Plan

start balooctl monitor
add a file known to crash the extractor to an indexable path
touch an unproblematic file
-> indexer crashes on first file and continues with the second

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Oct 16 2018, 10:47 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptOct 16 2018, 10:47 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Oct 16 2018, 10:47 PM
apol added a subscriber: apol.Oct 16 2018, 11:12 PM
apol added inline comments.
src/file/extractorprocess.cpp
45

Shouldn't it check the exitCode too?

55

Do you really need to waitForStarted?

src/file/filecontentindexer.cpp
74

= false;

bruns marked 3 inline comments as done.Oct 16 2018, 11:20 PM
bruns added inline comments.
src/file/extractorprocess.cpp
45

the exitCode is "only valid for normal exits" (Qt docu), and 0 otherwise.

55

Copied from the old code. Should not hurt, as done from a runner thread.

bruns updated this revision to Diff 43769.Oct 16 2018, 11:20 PM
bruns marked 2 inline comments as done.

coding style

poboiko added inline comments.Oct 17 2018, 10:01 AM
src/file/filecontentindexer.cpp
75

Is it OK to use QObject::connect inside a while loop? Those are not Qt::UniqueConnection, won't they fire multiple times (more and more, actually)?

91

Do I understand correctly, that's a binary-search-like way to find the file actually causing extractor to crash, and it will reindex some of the files in a batch ~log2(batchsize) times?
Why don't we rely instead on its progress reporting, via startedIndexingFile / finishedIndexingFile?

broulik added inline comments.
src/file/filecontentindexer.cpp
75

given loop lives on the stack it is destroyed when the scope is left and the connection severed

bruns added inline comments.Oct 17 2018, 12:21 PM
src/file/filecontentindexer.cpp
91

Yes, its a binary search.
Because we only have IDs here, and the progress reporting works on strings.

poboiko added inline comments.Oct 17 2018, 1:16 PM
src/file/filecontentindexer.cpp
91

There is Transaction::documentId(const QByteArray& path), which can resolve it using DocumentUrlDB.
I believe it still would be cheaper than reindexing several files multiple times.

bruns marked 3 inline comments as done.Oct 17 2018, 1:42 PM
bruns added inline comments.
src/file/filecontentindexer.cpp
91

But it is racy - if the file is replaced in the meantime, inode and filename no longer match. This is not completely unlikely when dealing with temporary files.

It would make the code also significantly more complex, and I want it simple here. It should only be hit in exceptional cases.

Also, it is not to uncommon to have a batch of size one from the start, i.e. when adding/modifying files.

bruns marked 3 inline comments as done.Oct 17 2018, 1:43 PM
poboiko added inline comments.Oct 20 2018, 9:16 AM
src/file/filecontentindexer.cpp
91

OK, we can simply count how many times we got finishedIndexingFile, and just go to the corresponding position in the batch.
It's just the binary search here does look a bit unnecessary to me...

bruns marked an inline comment as done.Oct 24 2018, 12:09 PM
bruns added inline comments.
src/file/filecontentindexer.cpp
91

Counting would give a good hint which one failed, but still no guarantee. Files may be added or removed during the indexer run, so the position is only approximate. The indexer may also have sent a "finished" message, and crash afterwards in a destructor call.

Doing a binary search is straight forward and avoids any dependencies or assumptions about other parts of the code.

Also, this code should be only temporary anyway - if the extractor is run in a separate process which only receives the file using a readonly file descriptor (for sandboxing) and passes back the result, the problematic documents id is known by the parent process.

bruns marked an inline comment as done.Oct 25 2018, 12:57 PM

How about accepting this as-is, and postponing any enhancements to another patch?

This has three benefits:

  1. no further delay
  2. it is bisectable in case some bug appears
  3. it is easy to spot how much code such a change requires
ngraham accepted this revision.Oct 25 2018, 2:12 PM

Yeah go ahead I think.

This revision is now accepted and ready to land.Oct 25 2018, 2:12 PM
This revision was automatically updated to reflect the committed changes.