Details

Reviewers

Commits

R37:9c34fef33209: BUG:360677 KisUpdaterContext should not kill threads in waitForDone()

Summary

KisUpdaterContext::waitForDone() method must lock context and wait
untill all threads finish their current tasks.
We cannot call QThreadPool.waitForDone() to do this, because after
waiting it removes all threads from the pool.

Test Plan

Verified that no extra threads created: 8 for global thread pool and 8 for
KisUpdaterContext

(gdb) info threads

Id   Target Id         Frame
22   Thread 0x7fff8cff9700 (LWP 18481) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
21   Thread 0x7fff8d7fa700 (LWP 18480) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
20   Thread 0x7fff8dffb700 (LWP 18479) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
19   Thread 0x7fff8e7fc700 (LWP 18478) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
18   Thread 0x7fff8effd700 (LWP 18477) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
17   Thread 0x7fff8f7fe700 (LWP 18476) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
16   Thread 0x7fff8ffff700 (LWP 18475) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
15   Thread 0x7fff9bfff700 (LWP 18474) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
13   Thread 0x7fffb8ff9700 (LWP 18471) "QFileInfoGather" 0x00007fffef6e291f in pthread_cond_wait () from /lib64/libpthread.so.0
12   Thread 0x7fffb97fa700 (LWP 18470) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
11   Thread 0x7fffb9ffb700 (LWP 18469) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
10   Thread 0x7fffba7fc700 (LWP 18468) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
9    Thread 0x7fffbaffd700 (LWP 18467) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
8    Thread 0x7fffbb7fe700 (LWP 18466) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
7    Thread 0x7fffbbfff700 (LWP 18465) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
6    Thread 0x7fffc8f8a700 (LWP 18464) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
5    Thread 0x7fffc978b700 (LWP 18463) "Thread (pooled)" 0x00007fffef6e2cc8 in pthread_cond_timedwait () from /lib64/libpthread.so.0
4    Thread 0x7fffca38d700 (LWP 18462) "KisTileDataSwap" 0x00007fffef6e291f in pthread_cond_wait () from /lib64/libpthread.so.0
3    Thread 0x7fffcab8e700 (LWP 18461) "KisTileDataPool" 0x00007fffef6e291f in pthread_cond_wait () from /lib64/libpthread.so.0
2    Thread 0x7fffe0843700 (LWP 18460) "QXcbEventReader" 0x00007ffff16a33ed in poll () from /lib64/libc.so.6

1 Thread 0x7fffe7f3f7c0 (LWP 18456) "krita" 0x00007ffff16a33ed in poll () from /lib64/libc.so.6

Diff Detail

Repository

R37 Krita

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

asavonic updated this revision to Diff 2827.Mar 17 2016, 10:46 PM

asavonic retitled this revision from to BUG:360677 KisUpdaterContext should not kill threads in waitForDone().

asavonic updated this object.

asavonic edited the test plan for this revision. (Show Details)

asavonic added a reviewer: dkazakov.

Just an idea: perhaps it's not a big problem, but this is a cpu hungry solution. You could wait for the threads one by one, always for a single thread to finish. Then you can do it with something like pthread_join() or QThread::wait(), I don't know how the threads are implemented.

Seems this patch causes deadlock (sometimes) when I try to open new tab. I'll fix it.

Hi, @asavonic!

Sorry for not replying quickly. The fact that the threads are releases when we call waitForDone() is a really good catch! I was always wondering why is happens and could never investigate it properly :)

The busy loop in the patch is really a bad idea, it saturates at least one CPU core and prevents other threads to do any useful work. This is quite bad idea. There are at least two options how this problems can be fixed/avoided:

Use QWaitCondition coupled with QThreadPool::activeThreadCount(). At least if I understood the meaning of activeThreadCount() correctly. You can check for an example of using QWaitCondition in KisUpdateScheduler::blockUpdates()

Another, less intrusive option would be to modify KisUpdateScheduler::isIdle() not to call tryBarrierLock(), but just use activeThreadCount() or something like that. This would be the best option, I guess. We call isIdle() once in a second or so, so this is the exact reason why the threads are dying. The other usage of the barrierLock() is legacy strokes, but they are not so popular.

This revision now requires changes to proceed.Mar 23 2016, 3:36 PM

I'm still working on this. Another implementation with QSemaphore created more problems than it solved.

Use conditional variable in KisUpdaterContext.waitForDone()
Use KisUpdateScheduler::barrierLock() in KisImage dtor

Use conditional variable in KisUpdaterContext.waitForDone()
Use KisUpdateScheduler::barrierLock() in KisImage dtor

This is still in progress, got a weird crash in KisImageTest.

ASSERT: "isDetached()" in file /usr/include/qt5/QtCore/qvector.h, line 551
#3  0x00007ffff31b7c7e in qt_assert(char const*, char const*, int) () from /usr/lib64/libQt5Core.so.5
#4  0x00007ffff78c90c5 in QVector<KisUpdateJobItem*>::reallocData (this=0x1e2a798, asize=8, aalloc=8, options=...) at /usr/include/qt5/QtCore/qvector.h:551
#5  0x00007ffff78c89b5 in QVector<KisUpdateJobItem*>::detach (this=0x1e2a798) at /usr/include/qt5/QtCore/qvector.h:355
#6  0x00007ffff78c8449 in QVector<KisUpdateJobItem*>::data (this=0x1e2a798) at /usr/include/qt5/QtCore/qvector.h:122
#7  0x00007ffff78c7ba0 in QVector<KisUpdateJobItem*>::operator[] (this=0x1e2a798, i=0) at /usr/include/qt5/QtCore/qvector.h:402
#8  0x00007ffff78c586c in KisUpdaterContext::waitForDone (this=0x1e2a778) at /home/asavonic/dev/krita/src/krita/libs/image/kis_updater_context.cpp:195
#9  0x00007ffff78e08da in KisUpdateScheduler::barrierLock (this=0x1eb1fd0) at /home/asavonic/dev/krita/src/krita/libs/image/kis_update_scheduler.cpp:337
#10 0x00007ffff79006e0 in KisImage::waitForDone (this=0x7e0080) at /home/asavonic/dev/krita/src/krita/libs/image/kis_image.cc:1161

Use thread-safe iteration over QVector in waitForDone()

The crash above is fixed, ready for review.

The cpunters for the threads is a good idea. For me this solution is good if the number of the threads and their order cannot change during the wait cycle.

Hi, @asavonic!

Your patch is really nice! I like the idea of making a snapshot of the jobs and checking them on waking up. There is a problem though. The patch changes the semantics of the waitForDone() call. Originally, waitForDone() in the QThreadPool waited for all the threads to finish. And since each thread before finishing started a new job, waitForDone() could not return before all the jobs of all the strokes and/or updates are finished. And this patch tries to solve a bit different problem: "wait until all the currently running jobs are finished, forget about the new ones". Your problem is actually a bit more complex, and yes, it needs snapshots and all the complications with getting snapshots.

Changing this behavior can lead to some unexpected problems in our code, because, e.g. KisToolTransform::slotRestartTransform() and all the unittests reliy on this behavior. Not sure if it is good idea, but it is what we have.

Will it work if you simplify the solution a bit?

In checkWaitCondition() just check if all the jobs are not running, without using the snapshots.
Use m_lock for waiting on a wait condition. That'll fix the problem of checking the validity of m_jobs.

"Starving" problem for a waiting threads is not a problem for us, because in most of the cases that is exactly what they want :)

libs/image/kis_image.cc
1161 ↗	(On Diff #3466)	This must be a misprint :) Either it is a debugging code or unlock() is forgotten
libs/image/kis_updater_context.cpp
227	(This comment deprecated due to my comment below in the review section) To have safe access to m_jobs the context itself should also be locked while doing this iteration. Otherwise the code in KisStrokesQueue::processQueue() and KisSimpleUpdatesQueue::processQueue() will overwrite the contents of m_jobs. And using Q_FOREACH without locking will cause a crash in a case when there is more than three threads accessing the m_jobs object. It happens because of implicit sharing in a temporary object (created inside Q_FOREACH), which will be detached when some other thread changed the original vector. You can check implementation of KisSafeReadNodeList and FOREACH_SAFE in KisNode to find more details about this problem. Basically, I would suggest just to add the locking, then using Q_FOREACH will become safe. But take care about the order of locking to avoid deadlocks. Ideally you'd better avoid having both locks locked at the same time. I'm not sure but it seems like moving m_condWaitLock.lock() below the loop should be ok.

Use semaphore to wait for all threads in KisUpdateContext

It is also necessary to change the order of signals emitted from the
job: we must emit sigJobFinished to release the semaphore and only after
that emit sigDoSomeUsefulWork.

Otherwise we will get a deadlock:

thread #1 call stack:
  - KisUpdaterContext::startJob // cannot acquire the semaphore
                                // since all threads are stopped
  - KisSimpleUpdateQueue::processQueue // locked the queue
  - KisUpdateScheduler::fullRefreshAsync

thread #2 call stack:
  - KisSimpleUpdateQueue::optimize       // cannot lock the queue
  - KisUpdateScheduler::doSomeUsefulWork
  - KisUpdateJobItem::run

Hi, @dkazakov

The patch changes the semantics of the waitForDone() call.

Yeah, I noticed that, quite a lot of code depends on this behaviour.

Will it work if you simplify the solution a bit?
In checkWaitCondition() just check if all the jobs are not running, without using the snapshots.
"Starving" problem for a waiting threads is not a problem for us, because in most of the cases that is exactly what they want :)

This is actually the solution I tried to avoid, because we can possibly wait forever in waitForDone() if we always have some jobs running (even if they are not related).
But, I guess you are right, without explicit dependencies between the jobs it is better to wait them all to finish.

I've pushed the fix (see the comment above).

And using Q_FOREACH without locking will cause a crash in a case when there is more than three thread

Didn't know this, thank you.

As far as I can understand, the original solution closes and reopens the threads. This works, just takes too much time and resources. So a solution to wait for a monent when there is no thread running, is good. It's only a hazard for the future, currently nothing creates new jobs infinitely during the wait.

Hi, @asavonic!

Speaking truly, I like your previous implementation more :)

I tried to calculate the locks order and it seems like your algorithm is actually safe. The only thing that worries me is that now the semaphore duplicates functionality of m_jobs and that KisSimpleUpdateQueue::optimize() call is neither covered by the semaphore nor m_jobs structure. It should not cause any problems right now, but it might be a problem in the future, because a semaphore is basically a lock, so it should be considered carefully.

The main blocker right now is that optimize() is called after the actually processed the queue. That is not good because we may end up doing to many unnecessary updates.

So I can see two solutions:

Fix your current patch to ensure optimize() is called before processQueue().
Get back to your QWaitCondition solution, just simplify it with having no snapshots. QWaitCondition doesn't add any lock (it can be based on m_lock), so it doesn't add complexity to the system. At least I guess so :)

And if you have any questions, don't hesitate to ping me on IRC. I might be wrong as well :)

libs/image/kis_update_job_item.h
74 ↗	(On Diff #3563)	The order of these call is important. In sigDoSomeUsefulWork() the updates queue is optimized to not do the same work multiple times, and in sigJobFinished() the queue is processed and the optimized tasks are put into the context. So this should be at least refactored somehow: new signal ca be added or the content of the slots can be regrouped.

Use QWaitCondition in waitForDone(), add asserts for lock()/unlock()

Add asserts when KisUpdaterContext methods used without lock()
Assume KisUpdaterContext::waitForDone() called with lock() held
Lock KisUpdaterContext where needed

Move all asserts under #ifndef QT_NO_DEBUG

Hi, @asavonic!

I'm sorry, I didn't notice you updated the patch! :)

The patch looks cool now, there are only three minor issues:

waitForDone() is usually considered to work like lock()/unlock() so people usually doesn't expect it to need any locking. Please move the locking inside waitForDone() itself :)
Please rename QT_NO_DEBUG into something local, like SANITY_CHECK_CONTEXT_LOCKING. We usually don't control the Qt's debugging definitions, and this sanity check should obviously be disabled in most of the cases :)
Btw, it would be a good idea to change Q_ASSERT into KIS_ASSERT, because the former ones are always disabled on Ubuntu.

Please fix these small issues and I will push your patch! :)

libs/image/kis_update_scheduler.cpp
417	Good catch! :)

Mark as needs changes

This revision now requires changes to proceed.Aug 14 2016, 5:35 PM

Revert lock() requirement on waitForDone
Replace Q_ASSERT with KIS_ASSERT

In D1166#45611, @dkazakov wrote:

waitForDone() is usually considered to work like lock()/unlock() so people usually doesn't expect it to need any locking. Please move the locking inside waitForDone() itself :)

Okay, but since the context lock is not recursive it implies the opposite requirement: we must not call waitForDone with lock held.
But now we have an assert for this in lock() :)

Please rename QT_NO_DEBUG into something local, like SANITY_CHECK_CONTEXT_LOCKING. We usually don't control the Qt's debugging definitions, and this sanity check should obviously be disabled in most of the cases :)

Maybe this is debatable, but I think these checks make sense only if they enabled by default (in debug builds only, of course).
Their only purpose is to warn about an improper use of KisUpdaterContext functions (calling them without locks).
Buy if you need a special macro to do this check, and you know about its existence, then you are probably already aware about the requirements.

Btw, it would be a good idea to change Q_ASSERT into KIS_ASSERT, because the former ones are always disabled on Ubuntu.

Good to know. Actually, Q_ASSERT works as expected on my Ubuntu machine (Qt 5.5.1), so it maybe depends on a Qt version.

Okay, but since the context lock is not recursive it implies the opposite requirement: we must not call waitForDone with lock held.
But now we have an assert for this in lock() :)

Awesome!

Please rename QT_NO_DEBUG into something local, like SANITY_CHECK_CONTEXT_LOCKING. We usually don't control the Qt's debugging definitions, and this sanity check should obviously be disabled in most of the cases :)

Maybe this is debatable, but I think these checks make sense only if they enabled by default (in debug builds only, of course).
Their only purpose is to warn about an improper use of KisUpdaterContext functions (calling them without locks).
Buy if you need a special macro to do this check, and you know about its existence, then you are probably already aware about the requirements.

Well, the point is that most of the developers use RelWithDebInfo builds. Or its derivative, like KritaDevs. So in 99% of the cases the check will be disabled and you will not be able to activate it without rebuilding entire Krita. We use DebugFull builds very rarely because they tend to be extremely slow.

Btw, it would be a good idea to change Q_ASSERT into KIS_ASSERT, because the former ones are always disabled on Ubuntu.

Good to know. Actually, Q_ASSERT works as expected on my Ubuntu machine (Qt 5.5.1), so it maybe depends on a Qt version.

I think it is related to the RelWithDebInfo thing I wrote above

libs/image/kis_update_scheduler.cpp
173	Yes, this reordering makes the code look much more sane! :)

I have checked your patch. It works perfectly fine, but as I expected, QT_NO_DEBUG is not defined in KritaDevs build mode. Of course, I can push the patch as it is, but it means that noone will run your sanity check before the release. I can propose the following solution to the problem:

Define a separate definition SANITY_CHECK_CONTEXT_LOCKING in a straightforward way in a header

#define SANITY_CHECK_CONTEXT_LOCKING

People will test it before the release
Right before the release we will ifdef the definition so people will not be distracted with it

#ifndef QT_NO_DEBUG
#define SANITY_CHECK_CONTEXT_LOCKING
#endif /* QT_NO_DEBUG */

How do you like it? If you are ok with the solution, please either update the patch or just write me and I will modify the patch right before pushing it.

This revision now requires changes to proceed.Aug 16 2016, 6:50 AM

In D1166#45964, @dkazakov wrote:

How do you like it? If you are ok with the solution

Yes, I think it is the best option.

or just write me and I will modify the patch right before pushing it.

That would be great, thank you!

Move debug checks under a SANITY_CHECK_CONTEXT_LOCKING macro

The patch works and looks perfectly fine now! I'm going to push it! :)

This revision is now accepted and ready to land.Aug 18 2016, 6:56 AM

Closed by commit R37:9c34fef33209: BUG:360677 KisUpdaterContext should not kill threads in waitForDone() (authored by asavonic, committed by dkazakov). · Explain WhyAug 18 2016, 7:00 AM

This revision was automatically updated to reflect the committed changes.

			Path	Packages
M			libs/image/kis_update_scheduler.cpp (12 lines)
M			libs/image/kis_updater_context.h (14 lines)
M			libs/image/kis_updater_context.cpp (103 lines)
M			libs/image/tests/kis_strokes_queue_test.cpp (14 lines)
M			libs/image/tests/kis_updater_context_test.cpp (2 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	2827	f037a59		Mar 17 2016, 10:46 PM	★	★
Diff 2	3464	5bd8323	- Use conditional variable in KisUpdaterContext.waitForDone()	Apr 23 2016, 12:18 AM	★	★
Diff 3	3465	5bd8323	- Use conditional variable in KisUpdaterContext.waitForDone()	Apr 23 2016, 1:02 AM	★	★
Diff 4	3466	5bd8323	- Use conditional variable in KisUpdaterContext.waitForDone()	Apr 23 2016, 2:20 AM	★	★
Diff 5	3563	5bd8323	- Use semaphore to wait for all threads in KisUpdateContext	Apr 29 2016, 8:27 PM	★	★
Diff 6	5529	9ed50d1	Use QWaitCondition in waitForDone(), add asserts for lock()/unlock()	Jul 27 2016, 8:58 PM	★	★
Diff 7	5531	9ed50d1	Move all asserts under #ifndef QT_NO_DEBUG	Jul 27 2016, 9:49 PM	★	★
Diff 8	5954	9ed50d1	- Revert lock() requirement on waitForDone	Aug 15 2016, 9:17 PM	★	★
Diff 9	6000	9ed50d1	- Move debug checks under a SANITY_CHECK_CONTEXT_LOCKING macro	Aug 17 2016, 5:42 PM	★	★
Diff 10	6007	ffc4d9c	rKRITA9c34fef3320983c8e0542363121035eab225cd29	Aug 18 2016, 6:59 AM	★	★

Commit	Tree	Parents	Author	Summary	Date
9fdaf157fcf0	58b80fb90ee4	71893f40a448	Andrew Savonichev	Move debug checks under a SANITY_CHECK_CONTEXT_LOCKING macro	Aug 17 2016, 5:41 PM
71893f40a448	87a19c5a2501	c96c6c318c1f	Andrew Savonichev	Replace Q_ASSERT with KIS_ASSERT	Aug 15 2016, 9:13 PM
c96c6c318c1f	e7dc7fe27082	7bac9482a678	Andrew Savonichev	Revert lock() requirement on waitForDone	Aug 15 2016, 7:00 PM
7bac9482a678	16e5a0f28f73	e27de2e4ea4b	Andrew Savonichev	Move all asserts under #ifndef QT_NO_DEBUG	Jul 27 2016, 9:44 PM
e27de2e4ea4b	a7f1e39df74a	24cbaf23560b	Andrew Savonichev	Lock KisUpdaterContext where needed	Jul 27 2016, 7:37 PM
24cbaf23560b	fdc402c20d23	07b40307081d	Andrew Savonichev	Assume KisUpdaterContext::waitForDone() called with lock()	Jul 27 2016, 7:15 PM
07b40307081d	9f2cc9ba20b4	c71eb9b8bd7a	Andrew Savonichev	Add asserts when KisUpdaterContext used without lock()	Jul 27 2016, 7:15 PM
c71eb9b8bd7a	636ad2e1fed6	709a6a45a1ba	Andrew Savonichev	Remove useless startJob function	Jul 27 2016, 6:01 PM
709a6a45a1ba	1b3113b49003	9afc63f7382a	Andrew Savonichev	Revert change with signals reorder	Jul 23 2016, 11:27 AM
9afc63f7382a	1422698e9a1a	389c3684c728	Andrew Savonichev	Loop over the m_jobs and wait for completion	Jul 7 2016, 9:59 PM
389c3684c728	22c7ef3a89ee	9ed50d15aa53	Andrew Savonichev	BUG:360677 KisUpdaterContext should not kill threads in waitForDone() (Show More…)	Jul 7 2016, 7:46 PM

BUG:360677 KisUpdaterContext should not kill threads in waitForDone()
ClosedPublic
Actions

Details

Diff Detail

Revision Contents
Changeset List

Diff 6007

libs/image/kis_update_scheduler.cpp

libs/image/kis_updater_context.h

libs/image/kis_updater_context.cpp

libs/image/tests/kis_strokes_queue_test.cpp

libs/image/tests/kis_updater_context_test.cpp

BUG:360677 KisUpdaterContext should not kill threads in waitForDone()ClosedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 6007

libs/image/kis_update_scheduler.cpp

libs/image/kis_updater_context.h

libs/image/kis_updater_context.cpp

libs/image/tests/kis_strokes_queue_test.cpp

libs/image/tests/kis_updater_context_test.cpp

BUG:360677 KisUpdaterContext should not kill threads in waitForDone()
ClosedPublic
Actions

Revision Contents
Changeset List