Wait for the extraction process to finish before scheduling
AbandonedPublic

Authored by poboiko on Oct 5 2018, 7:57 AM.

Details

Reviewers
None
Group Reviewers
Baloo
Frameworks
Summary

Right now, a race condition might happen. Signal FileContentIndexer::done is connected to FileIndexScheduler::scheduleIndexing,
which does scheduling only when there is no active threads (NB: FileContentIndexer is running inside one).
The signal is emitted in the very end of FileContentIndexer::run, which should be fine. However, after run is finished,
it also calls destructor for local ExtractorProcess process, which destroys QProcess m_extractorProcess, which itself waits for the process to finish.

If FileContentIndexer::done signal is processed before process is finished, scheduling won't happen and baloo will be stuck in ContentIndexing state.
(which is exactly what pointed me to this issue: File Indexer Monitor application showed that baloo is stuck in ContentIndexing, when it clearly should be Idle)

I suggest instead to emit FileContentIndexer::done signal only when ExtractorProcess is finished.

Test Plan

I can no longer reproduce indexer being stuck in ContentIndexing phase.

Diff Detail

Repository
R293 Baloo
Branch
wait-extractor-finish (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3513
Build 3531: arc lint + arc unit
poboiko created this revision.Oct 5 2018, 7:57 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptOct 5 2018, 7:57 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
poboiko requested review of this revision.Oct 5 2018, 7:57 AM

That doesn't looks like it works all the time: sometimes I still have this issue.
Apparently, it still can happen that even when the process is finished, signal is emitted, but the thread is not finished yet.

Also, it seems like due to QueuedConnection, ExtractorProcess can get removed before signal is processed, and because of that signal got lost. We can try something like that:

connect(&process, &ExtractorProcess::finished, this, [this]{
             QMetaObject::invokeMethod(this, "done", Qt::QueuedConnection);
});

(so that finished signal is processed immediately, but the done signal is emitted queued).

bruns added a subscriber: bruns.Oct 6 2018, 11:20 PM

I think the most trivial and completely correct way to fix this problem is to remove the m_threadPool.activeThreadCount() check from scheduleIndexing(). QThreadPool::start(runnable) puts the runnable in the run queue if there is still another thread running.

bruns added a comment.Oct 8 2018, 10:18 PM

This works for me:

 void FileIndexScheduler::scheduleIndexing()
 {
-    if (m_threadPool.activeThreadCount() || m_indexerState == Suspended) {
+    if (m_indexerState == Suspended) {
         return;
     }

This works for me:

 void FileIndexScheduler::scheduleIndexing()
 {
-    if (m_threadPool.activeThreadCount() || m_indexerState == Suspended) {
+    if (m_indexerState == Suspended) {
         return;
     }

That might require some changes in the scheduling logic. Right now it will behave weirdly.
If we do this change, there might be situations where bunch of i.e. ModifiedFileIndexer gets pushed to queue in a row. For example, we were inside FirstRunIndexer, and 10 files were modified.
Even worse, if will have a single new file and then 10 modified files - it will push 10 NewFileIndexers.

bruns added a comment.Oct 12 2018, 7:01 PM

This works for me:

 void FileIndexScheduler::scheduleIndexing()
 {
-    if (m_threadPool.activeThreadCount() || m_indexerState == Suspended) {
+    if (m_indexerState == Suspended) {
         return;
     }

That might require some changes in the scheduling logic. Right now it will behave weirdly.
If we do this change, there might be situations where bunch of i.e. ModifiedFileIndexer gets pushed to queue in a row. For example, we were inside FirstRunIndexer, and 10 files were modified.

Yes, this could happen ...

Even worse, if will have a single new file and then 10 modified files - it will push 10 NewFileIndexers.

No, this will not happen. The single entry in m_newFiles will essentially be moved into the the NewFileIndexer (Source) and afterwards m_newFiles is empty().

But you are right, this should be improved.

poboiko abandoned this revision.Oct 20 2018, 1:08 PM

Dropped in favor of D16265: [Scheduler] Use flag to track when a runner is going idle, which handles this problem better.