Improve KIO asynchronicity
Open, NormalPublic

Description

Currently some of KIO file operations have some optimized path when working with local files avoiding using ioslave but use direct files operations.
This makes those file operations block the calling Thread, freezing it until the file operation has finished.

This is troublesome and we need to improve this.
Identified KIO calls concerned are so far :

  • KIO::del
  • dir listing

It causes bugs (I suspect there are more spread across apps using KIO) :

With @dfaure we discussed a way to use an io worker QThread to run those optimized path out of the main thread.
In essence the work will be about moving the io code logic to a separate worker class which will run in a separate thread.
Please It won't help the performance, but performance perception since we will then to have progressive file listing for instance.

This pseudo code presents the basic logic for the DeleteJob, details will likely change (I am not sure if using invokeMethod is necessary, connect with signal/slot might be enough) :

DeleteJob() {
   QThread m_ioThread:
   IOWorker worker = IOWorker() # subclass QObject
   worker->moveToThread(m_ioThread)
   worker->invokeMethod(worker, &IOWorker::deleteDirs, this):
   m_ioThread.start();
}

IOWorker::deleteFiles () {

    [...]
    if (timer.elapsed() > delay) { // 200 ms to follow value used in sftp ioslave for instance
        invokeMethod(job, &IOWorker::(slotProcesedFilesChanged| slotDeleteFileInProgress), numFilei, sizeRemoved);
        timer.restart();
    }

    [...]
}
IOWorker::deleteDirs(){
[...]
    // empty dir with deleteFiles
[...]
    invokeMethod(job, &IOWorker::slotWorkerFinished);
}

We need to relay event back and force then between the job and its worker object inside its execution thread.

meven created this task.Sep 12 2019, 8:44 AM
meven triaged this task as Normal priority.
meven updated the task description. (Show Details)Oct 12 2019, 8:49 AM
meven updated the task description. (Show Details)Oct 18 2019, 10:33 AM
ognarb added a subscriber: ognarb.Oct 26 2019, 1:40 PM
meven added a comment.Oct 30 2019, 7:08 PM

@dfaure I think that ioslave suffer from the same kind of issue DeleteJob has : it freezes because of I/O.
I believe it is the origin lag in such cases as https://bugs.kde.org/show_bug.cgi?id=342056 especially on slow I/O operation like move/copy.

I don't get though how kded5 can get UIs to be affected.
My two hypothesis is that App makes synchronous calls to KIO::* themselves blocked by kded not responding because of I/O freeze or that Apps are jagged because kded socket because unresponsive.

Anyway I think we should tackle both issue : Apps should not freeze/lag because of kded not being responsive, kded should not freeze because of I/O.

I'm pretty sure there's a tonne of blocking DBus calls being done all over the place during file copy (checking for slaves on hold, updating kuiserver job tracking info, etc) which can cause apps to hang or slow down if whatever they're trying to call into is busy. Best example: SIGSTOP plasmashell, try copying a file in Dolphin, won't work until plasmashell SIGCONTs.

meven added a comment.Jan 27 2020, 9:05 AM

I'm pretty sure there's a tonne of blocking DBus calls being done all over the place during file copy (checking for slaves on hold, updating kuiserver job tracking info, etc) which can cause apps to hang or slow down if whatever they're trying to call into is busy. Best example: SIGSTOP plasmashell, try copying a file in Dolphin, won't work until plasmashell SIGCONTs.

That's interesting, we need to fix that.

About https://bugs.kde.org/show_bug.cgi?id=281270 Inconsistent notifications during/after file operations (copying, moving, deleting, compressing, extracting) depending on amount of data/items involved :

A possible way to fix this would probably use sendfile with O_NONBLOCK + select.
After a non-blocking sendfile we call select/pselect on the dest file descriptor. Then when it returns something other than EWOULDBLOCK, we can deduce the processed bytes from the size sent to copy to dest (i.e sendfile count parameter) or use pselect and its sigset_t parameter. We then can call sendfile again to write next bytes.
On the plus side it would make the code async which would make progress reporting more reactive.

meven added a comment.EditedJan 28 2020, 7:17 PM

I'm pretty sure there's a tonne of blocking DBus calls being done all over the place during file copy (checking for slaves on hold, updating kuiserver job tracking info, etc) which can cause apps to hang or slow down if whatever they're trying to call into is busy. Best example: SIGSTOP plasmashell, try copying a file in Dolphin, won't work until plasmashell SIGCONTs.

That's interesting, we need to fix that.

About https://bugs.kde.org/show_bug.cgi?id=281270 Inconsistent notifications during/after file operations (copying, moving, deleting, compressing, extracting) depending on amount of data/items involved :

A possible way to fix this would probably use sendfile with O_NONBLOCK + select.
After a non-blocking sendfile we call select/pselect on the dest file descriptor. Then when it returns something other than EWOULDBLOCK, we can deduce the processed bytes from the size sent to copy to dest (i.e sendfile count parameter) or use pselect and its sigset_t parameter. We then can call sendfile again to write next bytes.
On the plus side it would make the code async which would make progress reporting more reactive.

This approach does not work unfortunately, select does not take into account filesystem sync.
I tried also with sync_file_range but it does not block when file is syncing to disk.

That being said the select method allows to free some main thread cpu time while the kernel is running sendfile.
Allowing for instance to notify ioslaves users of progress concurrently.

Regarding this bug I have the idea of adding a state "Closing file" or "Writing data to disk" to the CopyJob so users can surface to users that something is going on.

And I want to edit D25079 where the chunk size for sendfile is probably too big.

meven updated the task description. (Show Details)Feb 3 2020, 3:01 AM
ngraham updated the task description. (Show Details)Feb 3 2020, 2:40 PM

You may want to look at the DBus part of this with dfer-monitor, a tool I have written. It is especially useful for looking at latency problems.
https://cgit.kde.org/dferry.git/

meven added a comment.Feb 3 2020, 2:52 PM

You may want to look at the DBus part of this with dfer-monitor, a tool I have written. It is especially useful for looking at latency problems.
https://cgit.kde.org/dferry.git/

It seriously lacks documentation, not even a README.

I have been using Bustle for dbus monitoring.

You may want to look at the DBus part of this with dfer-monitor, a tool I have written. It is especially useful for looking at latency problems.
https://cgit.kde.org/dferry.git/

It seriously lacks documentation, not even a README.

I have been using Bustle for dbus monitoring.

True, but here's the interesting part: You compile and install it, then you get a Qt application called dfer-analyzer (sorry I wrote the wrong thing, dfer-monitor is more like dbus-monitor) that is a nice way to look at DBus traffic.

meven added a comment.Apr 4 2020, 1:31 PM

I have tried using https://www.kdab.com/uiwatchdog-a-keepalive-monitor-for-the-gui-thread/ to see what makes the system stutter during file operation, it seems it is due to Breeze processing with styling loading, and QXcb* events.

meven added a comment.EditedApr 19 2020, 6:54 PM

While copying from local to a nfs mount with the destination opened, you get a dolphin freeze because KCoreDirlister is run in main thread and because of

#0  statx (fd=-100, 
    path=0x5564ae9e5148 "/media/NFS/Myfile", flags=0, mask=4095, buf=0x7ffc8ba20420) at ../sysdeps/unix/sysv/linux/statx.c:29
#1  0x00007f3559719016 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#2  0x00007f35596b3984 in QFileInfo::isDir() const () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#3  0x00007f355b1a5913 in ?? () from /usr/lib/x86_64-linux-gnu/libKF5KIOCore.so.5
#4  0x00007f35597a3458 in QMetaObject::activate(QObject*, int, int, void**) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5  0x00007f355a916fd6 in KDirWatch::dirty(QString const&) () from /usr/lib/x86_64-linux-gnu/libKF5CoreAddons.so.5
#6  0x00007f35597a3d5a in QObject::event(QEvent*) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7  0x00007f355a268a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#8  0x00007f355a2720f0 in QApplication::notify(QObject*, QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#9  0x00007f355977793a in QCoreApplication::notifyInternal2(QObject*, QEvent*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x00007f355977a5b8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007f35597cff67 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#12 0x00007f35576c1fbd in g_main_context_dispatch () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007f35576c2240 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x00007f35576c22e3 in g_main_context_iteration () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#15 0x00007f35597cf565 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007f35597764db in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007f355977e246 in QCoreApplication::exec() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007f355b9de8b5 in kdemain () from /usr/lib/x86_64-linux-gnu/libkdeinit5_dolphin.so
#19 0x00007f355b7c70b3 in __libc_start_main (main=0x5564ad350060, argc=1, argv=0x7ffc8ba20f38, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc8ba20f28) at ../csu/libc-start.c:308
#20 0x00005564ad35009e in _start ()

The destination is marked dirty because of the new file being created.
It triggers KDirWatch::dirty which in turn makes KCoreDirListerCache::slotFileDirty trigger a statx on the new file.
Which blocks since the NFS drive is already in use.
And this freezes Dolphin UI since KCoreDirlister is in dolphin main thread.

Two things to improve the situation :

  • run KCoreDirlister outside of dolphin main UI
  • perhaps prevent/ limit the stat calls made to a NFS mount, especially when busy.