Fix crash while moving files
ClosedPublic

Authored by hallas on Tue, Feb 19, 8:13 PM.

Details

Summary

The crash happens when the following sequence of events occurs:

  1. A FileCopyJob is created and added as a SubJob to CopyJob
  2. CopyJob::slotResult is notified
  3. CopyJobPrivate::slotResultCopyingFiles is called because it is in state STATE_COPYING_FILES
  4. CopyJobPrivate::slotResultErrorCopyingFiles is called bcause the FileCopyJob is reporting an error
  5. uiDelegateExtension()->askSkip is called to ask the user how to handle the error
  6. The askSkip function starts a modal QDialog and calls exec, this means a new QEventLoop is started and runs.
  7. Before the askSkip function returns, CopyJob::slotResult is notified again because a new error occurs on the same FileCopyJob
  8. CopyJobPrivate::slotResultCopyingFiles is called because it is in state STATE_COPYING_FILES
  9. CopyJobPrivate::slotResultErrorCopyingFiles is called bcause the FileCopyJob is reporting an error
  10. uiDelegateExtension()->askSkip is called to ask the user how to handle the error
  11. The user responds and the job is cancelled
  12. Finally the initial call to askSkip returns but we are now in an inconsistent state because the job that was notified has been deleted

To avoid getting notifications for a job while waiting for user feedback simply remove it using removeSubJob prior to calling askSkip.

Test Plan
  • Move a directory with a bunch of files to a NTFS USB Drive
  • While it is moving the files, move them back
  • Sometimes I have to have more then one move back and forth running for it to crash

Diff Detail

Repository
R241 KIO
Branch
fix_crash_while_moving_files (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9106
Build 9124: arc lint + arc unit
hallas created this revision.Tue, Feb 19, 8:13 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptTue, Feb 19, 8:13 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
hallas requested review of this revision.Tue, Feb 19, 8:13 PM
dhaumann added inline comments.
src/lib/jobs/kjob.cpp
105 ↗(On Diff #52108)

Alternatively to moving the code, you could also use a QPointer<KJob> that(this); to monitor whether the QObject is still existing. Then, you'd have to call

if (that && isAutoDelete()) {...}

What I wonder is if calling deleteLater() earlier, would it be possible to somehow change order of execution?
Reasoning: deleteLater() goes through the event queue, but the signals & slots as well go through the event queue in case of threaded execution or when using Qt::QueuedConnection. So this possibly changes the order of events.

Someone who knows the code and how this internally is processed definitely needs to have a look at this :-)

dfaure requested changes to this revision.Tue, Feb 19, 10:55 PM

If some code is deleting this job from a slot connected to it, that code needs to be fixed. This hack isn't a fix, it will only create more problems..

E.g. if some other bad code runs the event loop in the slot connected to finished, it will now delete the job, and we'll get the exact same crash but on if (emitResult) this time.

This is a big no from me. The emitter of a signal MUST NOT be deleted in slots connected to it.

The bug needs proper analysis.

This revision now requires changes to proceed.Tue, Feb 19, 10:55 PM

If some code is deleting this job from a slot connected to it, that code needs to be fixed. This hack isn't a fix, it will only create more problems..

E.g. if some other bad code runs the event loop in the slot connected to finished, it will now delete the job, and we'll get the exact same crash but on if (emitResult) this time.

This is a big no from me. The emitter of a signal MUST NOT be deleted in slots connected to it.

The bug needs proper analysis.

Hi David,

thanks for the reply :) I completely agree with you that this is a hack and shouldn't go in, it was merely meant as an aid in doing proper analysis of the bug. I hope one of the people who can actually reproduce the problem can try with this patch enabled and see if it goes away. If it does then I think we can be pretty certain that someone is deleting jobs that they shouldn't. Otherwise I don't have much to go on. I have tried to reproduce the problem in various ways, copying to/from an NTFS drive and also copying to/from SMB but I can't get it to crash. I have been looking some more into the code and I still cannot figure out how the KJob instance can get invalidated. Have you looked at the backtraces? Do you have any ideas or pointers to where to look?

Appreciate the feedback

If some code is deleting this job from a slot connected to it, that code needs to be fixed. This hack isn't a fix, it will only create more problems..

E.g. if some other bad code runs the event loop in the slot connected to finished, it will now delete the job, and we'll get the exact same crash but on if (emitResult) this time.

This is a big no from me. The emitter of a signal MUST NOT be deleted in slots connected to it.

The bug needs proper analysis.

Hi David,

thanks for the reply :) I completely agree with you that this is a hack and shouldn't go in, it was merely meant as an aid in doing proper analysis of the bug. I hope one of the people who can actually reproduce the problem can try with this patch enabled and see if it goes away. If it does then I think we can be pretty certain that someone is deleting jobs that they shouldn't. Otherwise I don't have much to go on. I have tried to reproduce the problem in various ways, copying to/from an NTFS drive and also copying to/from SMB but I can't get it to crash. I have been looking some more into the code and I still cannot figure out how the KJob instance can get invalidated. Have you looked at the backtraces? Do you have any ideas or pointers to where to look?

Appreciate the feedback

Ok, I think I just found a way to reproduce the problem:

  • Move a directory with a bunch of files to a NTFS USB Drive
  • While it is moving the files, move them back
  • Sometimes I have to have more then one move back and forth running for it to crash

I have now built KIO with address sanitizer and I get the following info:

==7539==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120002e3de0 at pc 0x7f2cdbbed173 bp 0x7ffec26c5050 sp 0x7ffec26c5040
READ of size 8 at 0x6120002e3de0 thread T0
    #0 0x7f2cdbbed172 in KIO::CopyJobPrivate::slotResultErrorCopyingFiles(KJob*) ../src/core/copyjob.cpp:1464
    #1 0x7f2cdbbeb26f in KIO::CopyJobPrivate::slotResultCopyingFiles(KJob*) ../src/core/copyjob.cpp:1333
    #2 0x7f2cdbbfba35 in KIO::CopyJob::slotResult(KJob*) ../src/core/copyjob.cpp:2160
    #3 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986)
    #4 0x7f2cd7e6e18b in KJob::result(KJob*, KJob::QPrivateSignal) (/usr/lib64/libKF5CoreAddons.so.5+0x3d18b)
    #5 0x7f2cd7e6ecc0 in KJob::finishJob(bool) (/usr/lib64/libKF5CoreAddons.so.5+0x3dcc0)
    #6 0x7f2cdbc95d8b in KIO::FileCopyJob::slotResult(KJob*) ../src/core/filecopyjob.cpp:497
    #7 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986)
    #8 0x7f2cd7e6e18b in KJob::result(KJob*, KJob::QPrivateSignal) (/usr/lib64/libKF5CoreAddons.so.5+0x3d18b)
    #9 0x7f2cd7e6ecc0 in KJob::finishJob(bool) (/usr/lib64/libKF5CoreAddons.so.5+0x3dcc0)
    #10 0x7f2cdbcb2267 in KIO::SimpleJob::slotFinished() ../src/core/simplejob.cpp:232
    #11 0x7f2cdbcb265d in KIO::SimpleJob::slotError(int, QString const&) ../src/core/simplejob.cpp:245
    #12 0x7f2cdbcb8e49 in QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1>, QtPrivate::List<int, QString const&>, void, void (KIO::SimpleJob::*)(int, QString const&)>::call(void (KIO::SimpleJob::*)(int, QString const&), KIO::SimpleJob*, void**) /usr/include/qt5/QtCore/qobjectdefs_impl.h:134
    #13 0x7f2cdbcb88ea in void QtPrivate::FunctionPointer<void (KIO::SimpleJob::*)(int, QString const&)>::call<QtPrivate::List<int, QString const&>, void>(void (KIO::SimpleJob::*)(int, QString const&), KIO::SimpleJob*, void**) /usr/include/qt5/QtCore/qobjectdefs_impl.h:167
    #14 0x7f2cdbcb83a7 in QtPrivate::QSlotObject<void (KIO::SimpleJob::*)(int, QString const&), QtPrivate::List<int, QString const&>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) /usr/include/qt5/QtCore/qobjectdefs_impl.h:396
    #15 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986)
    #16 0x7f2cdbc63a91 in KIO::SlaveInterface::error(int, QString const&) src/core/KF5KIOCore_autogen/include/moc_slaveinterface.cpp:425
    #17 0x7f2cdbc5da14 in KIO::SlaveInterface::dispatch(int, QByteArray const&) ../src/core/slaveinterface.cpp:192
    #18 0x7f2cdbc5cac7 in KIO::SlaveInterface::dispatch() ../src/core/slaveinterface.cpp:89
    #19 0x7f2cdbc69a74 in KIO::Slave::gotInput() ../src/core/slave.cpp:406
    #20 0x7f2cdbc71d88 in QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (KIO::Slave::*)()>::call(void (KIO::Slave::*)(), KIO::Slave*, void**) /usr/include/qt5/QtCore/qobjectdefs_impl.h:134
    #21 0x7f2cdbc719cb in void QtPrivate::FunctionPointer<void (KIO::Slave::*)()>::call<QtPrivate::List<>, void>(void (KIO::Slave::*)(), KIO::Slave*, void**) /usr/include/qt5/QtCore/qobjectdefs_impl.h:167
    #22 0x7f2cdbc7118b in QtPrivate::QSlotObject<void (KIO::Slave::*)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) /usr/include/qt5/QtCore/qobjectdefs_impl.h:396
    #23 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986)
    #24 0x7f2cdbb5fdaa in KIO::Connection::readyRead() src/core/KF5KIOCore_autogen/include/moc_connection_p.cpp:143
    #25 0x7f2cdbb5c1a0 in KIO::ConnectionPrivate::dequeue() ../src/core/connection.cpp:46
    #26 0x7f2cdbb5f9ae in KIO::Connection::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) src/core/KF5KIOCore_autogen/include/moc_connection_p.cpp:87
    #27 0x7f2cd612b5a9 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655a9)
    #28 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b)
    #29 0x7f2cd728234e in QApplication::notify(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x16234e)
    #30 0x7f2cd6101366 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/usr/lib64/libQt5Core.so.5+0x23b366)
    #31 0x7f2cd61041c0 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/usr/lib64/libQt5Core.so.5+0x23e1c0)
    #32 0x7f2cd6154362  (/usr/lib64/libQt5Core.so.5+0x28e362)
    #33 0x7f2ccfecfe26 in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x4be26)
    #34 0x7f2ccfed005f  (/usr/lib64/libglib-2.0.so.0+0x4c05f)
    #35 0x7f2ccfed00eb in g_main_context_iteration (/usr/lib64/libglib-2.0.so.0+0x4c0eb)
    #36 0x7f2cd615414e in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x28e14e)
    #37 0x7f2cc57ad740  (/usr/lib64/libQt5XcbQpa.so.5+0xcc740)
    #38 0x7f2cd6100159 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x23a159)
    #39 0x7f2cd7469cc6 in QDialog::exec() (/usr/lib64/libQt5Widgets.so.5+0x349cc6)
    #40 0x7f2cd9300ed8 in KMessageBox::createKMessageBox(QDialog*, QDialogButtonBox*, QIcon const&, QString const&, QStringList const&, QString const&, bool*, QFlags<KMessageBox::Option>, QString const&, QMessageBox::Icon) (/usr/lib64/libKF5WidgetsAddons.so.5+0xbeed8)
    #41 0x7f2cd930108d in KMessageBox::createKMessageBox(QDialog*, QDialogButtonBox*, QMessageBox::Icon, QString const&, QStringList const&, QString const&, bool*, QFlags<KMessageBox::Option>, QString const&) (/usr/lib64/libKF5WidgetsAddons.so.5+0xbf08d)
    #42 0x7f2cd9304a79  (/usr/lib64/libKF5WidgetsAddons.so.5+0xc2a79)
    #43 0x7f2cd9304f5e  (/usr/lib64/libKF5WidgetsAddons.so.5+0xc2f5e)
    #44 0x7f2cdc3978ec  (/usr/lib64/libKF5JobWidgets.so.5+0xd8ec)
    #45 0x7f2cd612b5a9 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655a9)
    #46 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b)
    #47 0x7f2cd728234e in QApplication::notify(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x16234e)
    #48 0x7f2cd6101366 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/usr/lib64/libQt5Core.so.5+0x23b366)
    #49 0x7f2cd61041c0 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/usr/lib64/libQt5Core.so.5+0x23e1c0)
    #50 0x7f2cd6154362  (/usr/lib64/libQt5Core.so.5+0x28e362)
    #51 0x7f2ccfecfe26 in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x4be26)
    #52 0x7f2ccfed005f  (/usr/lib64/libglib-2.0.so.0+0x4c05f)
    #53 0x7f2ccfed00eb in g_main_context_iteration (/usr/lib64/libglib-2.0.so.0+0x4c0eb)
    #54 0x7f2cd615414e in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x28e14e)
    #55 0x7f2cc57ad740  (/usr/lib64/libQt5XcbQpa.so.5+0xcc740)
    #56 0x7f2cd6100159 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x23a159)
    #57 0x7f2cd6108b3f in QCoreApplication::exec() (/usr/lib64/libQt5Core.so.5+0x242b3f)
    #58 0x7f2cdf3aa336 in kdemain ../src/main.cpp:168
    #59 0x5614d66ce86b in main src/dolphin_dummy.cpp:3
    #60 0x7f2cde617ae6 in __libc_start_main (/lib64/libc.so.6+0x21ae6)
    #61 0x5614d66ce749 in _start (/home/dha/workspace/kde/install/bin/dolphin+0x749)

0x6120002e3de0 is located 288 bytes inside of 320-byte region [0x6120002e3cc0,0x6120002e3e00)
freed by thread T0 here:
    #0 0x7f2cdf7774b0 in operator delete(void*) (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xe14b0)
    #1 0x7f2cdbc0912b in KIO::CopyJobPrivate::~CopyJobPrivate() ../src/core/copyjob.cpp:120
    #2 0x7f2cdbc8633d in KIO::Job::~Job() ../src/core/job.cpp:55
    #3 0x7f2cdbbd7638 in KIO::CopyJob::~CopyJob() ../src/core/copyjob.cpp:304
    #4 0x7f2cdbbd7653 in KIO::CopyJob::~CopyJob() ../src/core/copyjob.cpp:306
    #5 0x7f2cd612b5c7 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655c7)
    #6 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b)

previously allocated by thread T0 here:
    #0 0x7f2cdf776638 in operator new(unsigned long) (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xe0638)
    #1 0x7f2cdbc04c0b in KIO::CopyJobPrivate::newJob(QList<QUrl> const&, QUrl const&, KIO::CopyJob::CopyMode, bool, QFlags<KIO::JobFlag>) (/home/dha/workspace/kde/install/lib64/libKF5KIOCore.so.5+0x151c0b)
    #2 0x7f2cdbbfd8d3 in KIO::move(QList<QUrl> const&, QUrl const&, QFlags<KIO::JobFlag>) ../src/core/copyjob.cpp:2252
    #3 0x7f2cdc818bc4 in KIO::DropJobPrivate::doCopyToDirectory() ../src/widgets/dropjob.cpp:486
    #4 0x7f2cdc817e44 in KIO::DropJobPrivate::slotTriggered(QAction*) ../src/widgets/dropjob.cpp:431
    #5 0x7f2cdc8180b8 in operator() ../src/widgets/dropjob.cpp:464
    #6 0x7f2cdc81c95e in call /usr/include/qt5/QtCore/qobjectdefs_impl.h:128
    #7 0x7f2cdc81c68e in call<QtPrivate::List<QAction*>, void> /usr/include/qt5/QtCore/qobjectdefs_impl.h:238
    #8 0x7f2cdc81c41a in impl /usr/include/qt5/QtCore/qobjectdefs_impl.h:421
    #9 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986)
    #10 0x7f2cd73e8b01 in QMenu::triggered(QAction*) (/usr/lib64/libQt5Widgets.so.5+0x2c8b01)

SUMMARY: AddressSanitizer: heap-use-after-free ../src/core/copyjob.cpp:1464 in KIO::CopyJobPrivate::slotResultErrorCopyingFiles(KJob*)
Shadow bytes around the buggy address:
  0x0c2480054760: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c2480054770: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2480054780: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c2480054790: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c24800547a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c24800547b0: fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd
  0x0c24800547c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c24800547d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c24800547e0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c24800547f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c2480054800: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==7539==ABORTING

I haven't figured out what's going on though. But the stack from where the object was freed looks like it was freed directly off an eventloop meaning it is most likely a result of a deleteLater call, but why would a slot still be connected to it? Or could it be caused by the fact that more then one QEventLoop is active since there is a KMessageBox::exec call involved? I will keep looking, but please comment if you have any pointers :)

If some code is deleting this job from a slot connected to it, that code needs to be fixed. This hack isn't a fix, it will only create more problems..

E.g. if some other bad code runs the event loop in the slot connected to finished, it will now delete the job, and we'll get the exact same crash but on if (emitResult) this time.

This is a big no from me. The emitter of a signal MUST NOT be deleted in slots connected to it.

The bug needs proper analysis.

Hi David,

thanks for the reply :) I completely agree with you that this is a hack and shouldn't go in, it was merely meant as an aid in doing proper analysis of the bug. I hope one of the people who can actually reproduce the problem can try with this patch enabled and see if it goes away. If it does then I think we can be pretty certain that someone is deleting jobs that they shouldn't. Otherwise I don't have much to go on. I have tried to reproduce the problem in various ways, copying to/from an NTFS drive and also copying to/from SMB but I can't get it to crash. I have been looking some more into the code and I still cannot figure out how the KJob instance can get invalidated. Have you looked at the backtraces? Do you have any ideas or pointers to where to look?

Appreciate the feedback

Ok, I think I just found a way to reproduce the problem:

  • Move a directory with a bunch of files to a NTFS USB Drive
  • While it is moving the files, move them back
  • Sometimes I have to have more then one move back and forth running for it to crash

    I have now built KIO with address sanitizer and I get the following info:

    ` ==7539==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120002e3de0 at pc 0x7f2cdbbed173 bp 0x7ffec26c5050 sp 0x7ffec26c5040 READ of size 8 at 0x6120002e3de0 thread T0 #0 0x7f2cdbbed172 in KIO::CopyJobPrivate::slotResultErrorCopyingFiles(KJob*) ../src/core/copyjob.cpp:1464 #1 0x7f2cdbbeb26f in KIO::CopyJobPrivate::slotResultCopyingFiles(KJob*) ../src/core/copyjob.cpp:1333 #2 0x7f2cdbbfba35 in KIO::CopyJob::slotResult(KJob*) ../src/core/copyjob.cpp:2160 #3 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void) (/usr/lib64/libQt5Core.so.5+0x264986) #4 0x7f2cd7e6e18b in KJob::result(KJob*, KJob::QPrivateSignal) (/usr/lib64/libKF5CoreAddons.so.5+0x3d18b) #5 0x7f2cd7e6ecc0 in KJob::finishJob(bool) (/usr/lib64/libKF5CoreAddons.so.5+0x3dcc0) #6 0x7f2cdbc95d8b in KIO::FileCopyJob::slotResult(KJob*) ../src/core/filecopyjob.cpp:497 #7 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void) (/usr/lib64/libQt5Core.so.5+0x264986) #8 0x7f2cd7e6e18b in KJob::result(KJob*, KJob::QPrivateSignal) (/usr/lib64/libKF5CoreAddons.so.5+0x3d18b) #9 0x7f2cd7e6ecc0 in KJob::finishJob(bool) (/usr/lib64/libKF5CoreAddons.so.5+0x3dcc0) #10 0x7f2cdbcb2267 in KIO::SimpleJob::slotFinished() ../src/core/simplejob.cpp:232 #11 0x7f2cdbcb265d in KIO::SimpleJob::slotError(int, QString const&) ../src/core/simplejob.cpp:245 #12 0x7f2cdbcb8e49 in QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1>, QtPrivate::List<int, QString const&>, void, void (KIO::SimpleJob::*)(int, QString const&)>::call(void (KIO::SimpleJob::*)(int, QString const&), KIO::SimpleJob*, void) /usr/include/qt5/QtCore/qobjectdefs_impl.h:134 #13 0x7f2cdbcb88ea in void QtPrivate::FunctionPointer<void (KIO::SimpleJob::*)(int, QString const&)>::call<QtPrivate::List<int, QString const&>, void>(void (KIO::SimpleJob::*)(int, QString const&), KIO::SimpleJob*, void) /usr/include/qt5/QtCore/qobjectdefs_impl.h:167 #14 0x7f2cdbcb83a7 in QtPrivate::QSlotObject<void (KIO::SimpleJob::*)(int, QString const&), QtPrivate::List<int, QString const&>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void, bool*) /usr/include/qt5/QtCore/qobjectdefs_impl.h:396 #15 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void) (/usr/lib64/libQt5Core.so.5+0x264986) #16 0x7f2cdbc63a91 in KIO::SlaveInterface::error(int, QString const&) src/core/KF5KIOCore_autogen/include/moc_slaveinterface.cpp:425 #17 0x7f2cdbc5da14 in KIO::SlaveInterface::dispatch(int, QByteArray const&) ../src/core/slaveinterface.cpp:192 #18 0x7f2cdbc5cac7 in KIO::SlaveInterface::dispatch() ../src/core/slaveinterface.cpp:89 #19 0x7f2cdbc69a74 in KIO::Slave::gotInput() ../src/core/slave.cpp:406 #20 0x7f2cdbc71d88 in QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (KIO::Slave::*)()>::call(void (KIO::Slave::*)(), KIO::Slave*, void) /usr/include/qt5/QtCore/qobjectdefs_impl.h:134 #21 0x7f2cdbc719cb in void QtPrivate::FunctionPointer<void (KIO::Slave::*)()>::call<QtPrivate::List<>, void>(void (KIO::Slave::*)(), KIO::Slave*, void) /usr/include/qt5/QtCore/qobjectdefs_impl.h:167 #22 0x7f2cdbc7118b in QtPrivate::QSlotObject<void (KIO::Slave::*)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void, bool*) /usr/include/qt5/QtCore/qobjectdefs_impl.h:396 #23 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void) (/usr/lib64/libQt5Core.so.5+0x264986) #24 0x7f2cdbb5fdaa in KIO::Connection::readyRead() src/core/KF5KIOCore_autogen/include/moc_connection_p.cpp:143 #25 0x7f2cdbb5c1a0 in KIO::ConnectionPrivate::dequeue() ../src/core/connection.cpp:46 #26 0x7f2cdbb5f9ae in KIO::Connection::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) src/core/KF5KIOCore_autogen/include/moc_connection_p.cpp:87 #27 0x7f2cd612b5a9 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655a9) #28 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b) #29 0x7f2cd728234e in QApplication::notify(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x16234e) #30 0x7f2cd6101366 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/usr/lib64/libQt5Core.so.5+0x23b366) #31 0x7f2cd61041c0 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/usr/lib64/libQt5Core.so.5+0x23e1c0) #32 0x7f2cd6154362 (/usr/lib64/libQt5Core.so.5+0x28e362) #33 0x7f2ccfecfe26 in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x4be26) #34 0x7f2ccfed005f (/usr/lib64/libglib-2.0.so.0+0x4c05f) #35 0x7f2ccfed00eb in g_main_context_iteration (/usr/lib64/libglib-2.0.so.0+0x4c0eb) #36 0x7f2cd615414e in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x28e14e) #37 0x7f2cc57ad740 (/usr/lib64/libQt5XcbQpa.so.5+0xcc740) #38 0x7f2cd6100159 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x23a159) #39 0x7f2cd7469cc6 in QDialog::exec() (/usr/lib64/libQt5Widgets.so.5+0x349cc6) #40 0x7f2cd9300ed8 in KMessageBox::createKMessageBox(QDialog*, QDialogButtonBox*, QIcon const&, QString const&, QStringList const&, QString const&, bool*, QFlags<KMessageBox::Option>, QString const&, QMessageBox::Icon) (/usr/lib64/libKF5WidgetsAddons.so.5+0xbeed8) #41 0x7f2cd930108d in KMessageBox::createKMessageBox(QDialog*, QDialogButtonBox*, QMessageBox::Icon, QString const&, QStringList const&, QString const&, bool*, QFlags<KMessageBox::Option>, QString const&) (/usr/lib64/libKF5WidgetsAddons.so.5+0xbf08d) #42 0x7f2cd9304a79 (/usr/lib64/libKF5WidgetsAddons.so.5+0xc2a79) #43 0x7f2cd9304f5e (/usr/lib64/libKF5WidgetsAddons.so.5+0xc2f5e) #44 0x7f2cdc3978ec (/usr/lib64/libKF5JobWidgets.so.5+0xd8ec) #45 0x7f2cd612b5a9 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655a9) #46 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b) #47 0x7f2cd728234e in QApplication::notify(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x16234e) #48 0x7f2cd6101366 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/usr/lib64/libQt5Core.so.5+0x23b366) #49 0x7f2cd61041c0 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) (/usr/lib64/libQt5Core.so.5+0x23e1c0) #50 0x7f2cd6154362 (/usr/lib64/libQt5Core.so.5+0x28e362) #51 0x7f2ccfecfe26 in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x4be26) #52 0x7f2ccfed005f (/usr/lib64/libglib-2.0.so.0+0x4c05f) #53 0x7f2ccfed00eb in g_main_context_iteration (/usr/lib64/libglib-2.0.so.0+0x4c0eb) #54 0x7f2cd615414e in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x28e14e) #55 0x7f2cc57ad740 (/usr/lib64/libQt5XcbQpa.so.5+0xcc740) #56 0x7f2cd6100159 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (/usr/lib64/libQt5Core.so.5+0x23a159) #57 0x7f2cd6108b3f in QCoreApplication::exec() (/usr/lib64/libQt5Core.so.5+0x242b3f) #58 0x7f2cdf3aa336 in kdemain ../src/main.cpp:168 #59 0x5614d66ce86b in main src/dolphin_dummy.cpp:3 #60 0x7f2cde617ae6 in __libc_start_main (/lib64/libc.so.6+0x21ae6) #61 0x5614d66ce749 in _start (/home/dha/workspace/kde/install/bin/dolphin+0x749)

    0x6120002e3de0 is located 288 bytes inside of 320-byte region [0x6120002e3cc0,0x6120002e3e00) freed by thread T0 here: #0 0x7f2cdf7774b0 in operator delete(void*) (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xe14b0) #1 0x7f2cdbc0912b in KIO::CopyJobPrivate::~CopyJobPrivate() ../src/core/copyjob.cpp:120 #2 0x7f2cdbc8633d in KIO::Job::~Job() ../src/core/job.cpp:55 #3 0x7f2cdbbd7638 in KIO::CopyJob::~CopyJob() ../src/core/copyjob.cpp:304 #4 0x7f2cdbbd7653 in KIO::CopyJob::~CopyJob() ../src/core/copyjob.cpp:306 #5 0x7f2cd612b5c7 in QObject::event(QEvent*) (/usr/lib64/libQt5Core.so.5+0x2655c7) #6 0x7f2cd727ad8b in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/usr/lib64/libQt5Widgets.so.5+0x15ad8b)

    previously allocated by thread T0 here: #0 0x7f2cdf776638 in operator new(unsigned long) (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xe0638) #1 0x7f2cdbc04c0b in KIO::CopyJobPrivate::newJob(QList<QUrl> const&, QUrl const&, KIO::CopyJob::CopyMode, bool, QFlags<KIO::JobFlag>) (/home/dha/workspace/kde/install/lib64/libKF5KIOCore.so.5+0x151c0b) #2 0x7f2cdbbfd8d3 in KIO::move(QList<QUrl> const&, QUrl const&, QFlags<KIO::JobFlag>) ../src/core/copyjob.cpp:2252 #3 0x7f2cdc818bc4 in KIO::DropJobPrivate::doCopyToDirectory() ../src/widgets/dropjob.cpp:486 #4 0x7f2cdc817e44 in KIO::DropJobPrivate::slotTriggered(QAction*) ../src/widgets/dropjob.cpp:431 #5 0x7f2cdc8180b8 in operator() ../src/widgets/dropjob.cpp:464 #6 0x7f2cdc81c95e in call /usr/include/qt5/QtCore/qobjectdefs_impl.h:128 #7 0x7f2cdc81c68e in call<QtPrivate::List<QAction*>, void> /usr/include/qt5/QtCore/qobjectdefs_impl.h:238 #8 0x7f2cdc81c41a in impl /usr/include/qt5/QtCore/qobjectdefs_impl.h:421 #9 0x7f2cd612a986 in QMetaObject::activate(QObject*, int, int, void**) (/usr/lib64/libQt5Core.so.5+0x264986) #10 0x7f2cd73e8b01 in QMenu::triggered(QAction*) (/usr/lib64/libQt5Widgets.so.5+0x2c8b01)

    SUMMARY: AddressSanitizer: heap-use-after-free ../src/core/copyjob.cpp:1464 in KIO::CopyJobPrivate::slotResultErrorCopyingFiles(KJob*) Shadow bytes around the buggy address: 0x0c2480054760: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c2480054770: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c2480054780: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa 0x0c2480054790: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c24800547a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c24800547b0: fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd 0x0c24800547c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c24800547d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c24800547e0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa 0x0c24800547f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c2480054800: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==7539==ABORTING `

    I haven't figured out what's going on though. But the stack from where the object was freed looks like it was freed directly off an eventloop meaning it is most likely a result of a deleteLater call, but why would a slot still be connected to it? Or could it be caused by the fact that more then one QEventLoop is active since there is a KMessageBox::exec call involved? I will keep looking, but please comment if you have any pointers :)

Ok, I have some more info, I have tried to make the following change in copyjob.cpp:

diff --git a/src/core/copyjob.cpp b/src/core/copyjob.cpp
index 1d2e870a..f661d5be 100644
--- a/src/core/copyjob.cpp
+++ b/src/core/copyjob.cpp
@@ -1457,7 +1457,9 @@ void CopyJobPrivate::slotResultErrorCopyingFiles(KJob *job)
             if (files.count() > 1) {
                 options |= SkipDialog_MultipleItems;
             }
+            QPointer<KJob> jobGuard(job);
             res = q->uiDelegateExtension()->askSkip(q, options, job->errorString());
+            Q_ASSERT(!jobGuard.isNull());
         }
     }

and I hit the above Q_ASSERT(!jobGuard.isNull()); - to me this means that while askSkip is blocking the job completes and is deleted, so when we return from the function we have essentially been destroyed. Does this make sense? And how should we fix it? Simply return?

I think I have finally found the root cause of this. The following events occurs:

  1. A FileCopyJob is created and added as a SubJob to CopyJob
  2. CopyJob::slotResult is notified
  3. CopyJobPrivate::slotResultCopyingFiles is called because it is in state STATE_COPYING_FILES
  4. CopyJobPrivate::slotResultErrorCopyingFiles is called bcause the FileCopyJob is reporting an error
  5. uiDelegateExtension()->askSkip is called to ask the user how to handle the error
  6. The askSkip function starts a modal QDialog and calls exec, this means a new QEventLoop is started and runs.
  7. Before the askSkip function returns, CopyJob::slotResult is notified again because a new error occurs on the same FileCopyJob
  8. CopyJobPrivate::slotResultCopyingFiles is called because it is in state STATE_COPYING_FILES
  9. CopyJobPrivate::slotResultErrorCopyingFiles is called bcause the FileCopyJob is reporting an error
  10. uiDelegateExtension()->askSkip is called to ask the user how to handle the error
  11. The user responds and the job is cancelled
  12. Finally the initial call to askSkip returns but we are now in an inconsistent state because the job that was notified has been deleted

So to avoid getting notifications for the same Job while waiting for user feedback I have tried to simply call removeSubJob prior to calling askSkip, since that function disconnects from the result signal. Also removeSubJob will anyway be call after the user has responded. This also means that the problem is in KIO::CopyJob and not i KJob.

hallas updated this revision to Diff 52240.Thu, Feb 21, 8:46 PM

Implement a proper fix

hallas edited the summary of this revision. (Show Details)Thu, Feb 21, 8:48 PM
hallas edited the test plan for this revision. (Show Details)
cfeck added a subscriber: cfeck.Thu, Feb 21, 9:31 PM

That a nested event loop of an error dialog caused the crashes makes sense, because they were reported for alien drives (NTFS) or transfers with permission problems. What actual error dialogs did you get and are they reproducible?

Great analysis, thanks! Now this makes a lot more sense ;-)

The thing that I don't get, is why the subjob would emit anything after result(). I.e. step 7 is not supposed to happen, at all.
cfeck: I think you're referring to warning(), which does pop up a messagebox, but here it's about result() not warning().

Do we really have the *same* job emitting result twice, here?
That would be illegal...

Great analysis, thanks! Now this makes a lot more sense ;-)

The thing that I don't get, is why the subjob would emit anything after result(). I.e. step 7 is not supposed to happen, at all.
cfeck: I think you're referring to warning(), which does pop up a messagebox, but here it's about result() not warning().

Do we really have the *same* job emitting result twice, here?
That would be illegal...

Yes it certainly appears like it is the same FileCopyJob emitting multiple times. I can try and dig a bit further to see if I can find the exact spot. But I still think it would be a good idea to handle this case in CopyJob so that we don't crash. Also because the crash has been seen with different protocols, smb, ntfs etc. so I don't know if there are multiple places that violate this single emitting rule?

Let's find out :-)

Note that SlaveBase already has a warning in case a slave emits finished() or error() twice; on the other hand this might not be the issue here, it could be the job itself being buggy (and then it would be in a single location).
But yeah I'd recommend to start with checking if you see those SlaveBase warnings (in the debug output of the slave, not dolphin).

Let's find out :-)

Note that SlaveBase already has a warning in case a slave emits finished() or error() twice; on the other hand this might not be the issue here, it could be the job itself being buggy (and then it would be in a single location).
But yeah I'd recommend to start with checking if you see those SlaveBase warnings (in the debug output of the slave, not dolphin).

I can see that it is because FileCopyJob spawns both a chmod job and then a del job, and then the chmod reports an error first and while we wait for the user feedback we receive an error from the del job and then report that, thereby violating the contract. In the beginning of the FileCopyJob::slotResult function there is a big error checking section where we check if we are moving, copying etc. but there is no check for chmod or delete and therefore it just ends in the emitResult case, so maybe we should put in a fix here as well?

@dfaure - what do you think?

I think it's an unwanted fact that the chmod job runs in parallel with the del job.
A "return" after creating the chmod job would fix all this.

But of course it should be possible to have two concurrent subjobs (one on dest, one on src), this whole issue just makes me realize that it makes the error handling more tricky ;-)

If we want to keep these two running in parallel, we need to fix FileCopyJob to kill the running subjob when emitting the error from the other one.
Would this fix it?
http://www.davidfaure.fr/2019/filecopyjob.cpp.diff

I think it's an unwanted fact that the chmod job runs in parallel with the del job.
A "return" after creating the chmod job would fix all this.

But of course it should be possible to have two concurrent subjobs (one on dest, one on src), this whole issue just makes me realize that it makes the error handling more tricky ;-)

If we want to keep these two running in parallel, we need to fix FileCopyJob to kill the running subjob when emitting the error from the other one.
Would this fix it?
http://www.davidfaure.fr/2019/filecopyjob.cpp.diff

Hi @dfaure I was actually thinking of the same fix ;)

I have updated the diff with your suggested fix, but I have also added the "reverse" case, so if the chmod reports an error and we have a del job running, kill the del job. If the del job reports an error and we have a chmod job running, kill the chmod job. This pattern of error handling is actually the same as with the getjob/putjob. I have tested the fix and can see that it solved the problem!

hallas updated this revision to Diff 52643.Tue, Feb 26, 3:36 PM

Updated fix.

hallas added inline comments.Tue, Feb 26, 3:37 PM
src/core/filecopyjob.cpp
495

Should we also set d->m_chmodJob to nullptr here, just like the m_putJob handling branch?

500

Should we also set d->m_delJob to nullptr here, just like the m_putJob handling branch?

dfaure requested changes to this revision.Sat, Mar 2, 11:00 PM

Thanks for testing the fix - and indeed the reverse case is useful to have too.

src/core/filecopyjob.cpp
495

Right, probably a good idea, just in case.

500

same

This revision now requires changes to proceed.Sat, Mar 2, 11:00 PM
hallas updated this revision to Diff 53032.Sun, Mar 3, 5:50 AM

Set job to nullptr before stopping job running in parallel

hallas marked 4 inline comments as done.Sun, Mar 3, 5:51 AM
dfaure accepted this revision.Sun, Mar 3, 8:10 AM
This revision is now accepted and ready to land.Sun, Mar 3, 8:10 AM
hallas closed this revision.Sun, Mar 3, 1:22 PM