Fix double delete crash during shutdown
AbandonedPublic

Authored by cullmann on Jun 25 2017, 12:35 PM.

Details

Reviewers
None
Group Reviewers
Frameworks
Summary

See:

https://bugs.kde.org/show_bug.cgi?id=380114

#6 0x00000000028c3780 in ?? ()
#7 0x00007fabbd75a4f6 in qDeleteAll<QList<Phonon::MediaObject*>::const_iterator> (end=..., begin=...) at /opt/local/include/qt5/QtCore/qalgorithms.h:320
#8 qDeleteAll<QList<Phonon::MediaObject*> > (c=...) at /opt/local/include/qt5/QtCore/qalgorithms.h:328
#9 NotifyByAudio::~NotifyByAudio (this=0x24ee390, __in_chrg=<optimized out>) at /opt/local/var/lnxports/build/_opt_local_site-ports_kf5_KF5-Frameworks/kf5-knotifications/work/knotifications-5.32.0/src/notifybyaudio.cpp:47

Avoid that there may be duplicated pointers in the reuse queue.

For notifications themself already a QHash is used, this set should avoid duplicate inserts.

The only way to get dupes IMHO is to have NotifyByAudio::close and NotifyByAudio::onAudioFinished race.

Can some of this signals be queued?

Test Plan

make && make test, thought that seems not to really test this "race".

Diff Detail

Repository
R289 KNotifications
Lint
Lint Skipped
Unit
Unit Tests Skipped
cullmann created this revision.Jun 25 2017, 12:35 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptJun 25 2017, 12:35 PM
rjvbb added a subscriber: rjvbb.Jun 25 2017, 1:14 PM

A priori this should be fine, and it might even address the long standing bug by leaving more time for Phonon objects to "do their thing". It might be an idea though to include a qCDebug() probe that outputs the number of items left for auto-cleanup in m_reusablePhonons.

But what's the official stance on deleting objects (widgets) that have a parent and are thus capable of auto-cleanup? AFAIK one can still delete them explicitly, and if they're reparented during that process they should no longer be included in the cleanup step performed by their original parent's dtor.

IOW, if you're right, isn't there a bug to address in Phonon::MediaObject?

There can also be another reason *in release builds* for double frees (which your change will probably catch, too): NotifyByAudio::notify() does

Q_ASSERT(!m_notifications.value(m));
m_notifications.insert(m, notification);

and NotifyByAudio::finishNotification() does

    m_notifications.remove(m);
//...
    m_reusablePhonons.append(m);

Release builds do not check if m_notifications already contains the object being added. It seems extremely unlikely that this would ever happen, but apparently the code author thought it could.

cullmann abandoned this revision.Jun 25 2017, 1:31 PM

Hmm, not really, I again just missed that the auto-delete will anyways only happen after the execution of ~NotifyByAudio, my patch just removes the earlier cleanup :/

rjvbb added a comment.EditedJun 25 2017, 1:33 PM
Hmm, not really

Not really what?

And, doh, yes, a double free during auto-cleanup (global destruction) cannot explain a crash in qDeleteAll() called by ~NotifyByAudio...

I don't see how an external action could delete a Phonon object and leave it in the m_reusablPhonons list, so I guess that *if* the crash happens due to a double free it must be due to duplicate entries in that list.

And that might not even be so very unlikely, is QList::takeFirst() thread safe?

Hmm, not really, I again just missed that the auto-delete will anyways only happen after the execution of ~NotifyByAudio, my patch just removes the earlier cleanup :/

cullmann updated this revision to Diff 15850.Jun 25 2017, 1:49 PM
cullmann edited the summary of this revision. (Show Details)

Next try ;=)
Question is, can really such a race happen, that finishNotification triggers duplicate insertion?
At least with a hash set, this won't lead to later issues.

mpyne added a subscriber: mpyne.Jun 25 2017, 3:05 PM
In D6376#119359, @rjvbb wrote:

But what's the official stance on deleting objects (widgets) that have a parent and are thus capable of auto-cleanup? AFAIK one can still delete them explicitly, and if they're reparented during that process they should no longer be included in the cleanup step performed by their original parent's dtor.

There's certainly no stance *against* manually deleting objects that should be automatically cleaned-up, as long as it's done correctly of course. In fact we've had crash bugs before due to automatic destruction deleting objects in the right order (e.g. siblings parented to the same widget, where one child widget makes a call to the other child widget in the course of its destruction -- those have to happen in the right order so the fix was to manually delete them in a safe order).

rjvbb added a comment.EditedJun 25 2017, 5:08 PM

Next try ;=)
Question is, can really such a race happen, that finishNotification triggers duplicate insertion?

Do you see any other explanation why a crash could occur under qDeleteAll, one that is due to a bug in KNotifications and not in Phonon itself?
Anyway, I've converted the assert into a check with a qCritical probe in the copy I'm using. I haven't yet been able to trigger it, but will report here if indeed I ever see the warning about inserting a duplicate.

I'd say the source for a duplicate insertion (and thus the location of the race condition) would be a concurrent execution of m_reusablePhonons.takeFirst(), where more than 1 thread gets hold of the same item.

At least with a hash set, this won't lead to later issues.

Why not use a barrier or lock to ensure that the takeFirst() requests are executed one by one?

FWIW, a race condition wouldn't only lead to duplicate entries in the list (something I'd hope Qt knows how to handle) but also to more than 1 thread using the same Phonon object at the same time. Who's to tell what state that object will be in?

I don't speak about threading races. The whole class is not thread-safe, if threading occurs, all is lost.

But I see the chance, that you arrive twice at the m_reusablePhonons.insert(m); line, as there are multiple ways into the function doing that, one e.g. via NotifyByAudio::close and one in NotifyByAudio::onAudioFinished.

If you e.g. have a queued event for NotifyByAudio::onAudioFinished, that might happen even after you do ::close.

In any case, with the hash set we don't need to care for that.

aacid added a subscriber: aacid.Jun 25 2017, 8:40 PM

I've had a look at the code and honestly i don't see how this can fix anything.

Do you have a reproduceable testcase for the crash?

Hi,

if you can arrive twice for the same mediaobject in finishNotification e.g. via close and onAudioFinished you might insert it twice and delete it twice.
If not, this change helps nothing.

Unfortunately I have no way to reproduce.

We can just discard that and I can still reopen it, if I have a way to reproduce.

Reading the MediaObject docs NotifyByAudio::close should avoid that onAudioFinished is triggered by stop() not emiting, therefore I am not sure how that should happen.

cullmann abandoned this revision.Jun 26 2017, 6:06 AM
rjvbb added a comment.Jun 26 2017, 8:48 AM

I don't speak about threading races. The whole class is not thread-safe, if threading occurs, all is lost.

Really? Only a single thread at a time should trigger audio notifications, is that what you're saying? That would surprise me a little, looking at the code it seems the class exists so that applications can request to play a notification sound by creating an instance and then forget about it. The threading I was thinking of is simply when 2 threads queue an audio notification at the same time, not the cross-thread use of a NotifyByAudio instance.

aacid added a comment.Apr 2 2018, 9:12 PM

I was able to half reproduce it for a while and came up with https://phabricator.kde.org/D11891 which I think is a bit less of a workaround than this code.