Plasmashell freezes when trying to get free space info from mounted remote filesystem after losing connection to it
ClosedPublic

Authored by McPain on Aug 17 2018, 7:49 AM.

Details

Summary

BUG: 397537

Earlier plasmashell assumed that you'll get free space info immediately (which is not true in case of losing connection to server containing a mounted filesystem - statfs will wait for response forever and freeze everything since it's happening in main thread)

I moved obtaining that info into different thread so that case won't freeze anything anymore.
It creates exactly one thread per one path. If a path is already being processed, new thread won't be created.
Also I implemented a timer used to notify about broken connection after 15 seconds.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
McPain created this revision.Aug 17 2018, 7:49 AM
Restricted Application added a project: Plasma. · View Herald TranscriptAug 17 2018, 7:49 AM
Restricted Application added a subscriber: plasma-devel. · View Herald Transcript
McPain requested review of this revision.Aug 17 2018, 7:49 AM
McPain updated this revision to Diff 39897.Aug 17 2018, 7:58 AM

Removed useless debug string

Thanks for looking into this.

Whilst this looks fine, we have a more standard abstraction layer for async IO via KIO.
There's a FileSystemFreeSpaceJob which should work here.

anthonyfieroni added inline comments.
dataengines/soliddevice/soliddeviceengine.cpp
67

Don't you forget to emit finished?

592–601

Use new connect syntax .

624

It does not need, you connect finished to deleteLater.

McPain updated this revision to Diff 39901.Aug 17 2018, 8:56 AM
McPain marked 3 inline comments as done.

Thanks for looking into this.

Whilst this looks fine, we have a more standard abstraction layer for async IO via KIO.
There's a FileSystemFreeSpaceJob which should work here.

We can always do it later, fixibg bug itself is more important than making it through KIO

davidedmundson requested changes to this revision.Aug 17 2018, 9:46 AM

But the KIO stuff is already there.

We can always do it later

That's never how it works.

This revision now requires changes to proceed.Aug 17 2018, 9:46 AM

Yes, please use KIO::FileSystemFreeSpaceJob which is a lot simpler than all of this manually added threading code.

Also beware that bool SolidDeviceEngine::updateSourceEvent is currently supposed to return true for when any of the sources changed which will no longer work when updateStorageSpace becomes asynchronous. Might not be an issue as you call setData anyway but just something I want you to keep in mind for testing.

Also beware that bool SolidDeviceEngine::updateSourceEvent is currently supposed to return true

Who will use that true?
I didn't found any places where that value is used.

The only place I found ignores it:

void SolidDeviceEngine::deviceChanged(const QString& udi, const QString &property, const QVariant &value)
{
    setData(udi, property, value);
    updateSourceEvent(udi);
}
This comment was removed by McPain.

Might not be an issue as you call setData anyway but just something I want you to keep in mind for testing.

something like

setData(udi, I18N_NOOP("Free Space"), QVariant(0));
setData(udi, I18N_NOOP("Free Space Text"), KFormat().formatByteSize(0));
setData(udi, I18N_NOOP("Size"), QVariant(0));

or what?
And what to do after obtaining real data?

It is just a question. I think since you call setData when the job finishes anyway, it should be fine but I don't really know the architecture of that stuff either, so it might be fine to just trigger a free space job in updateSourceEvent

McPain updated this revision to Diff 39929.Aug 17 2018, 2:40 PM

Used KIO::FileSystemFreeSpaceJob

anthonyfieroni added inline comments.Aug 17 2018, 2:59 PM
dataengines/soliddevice/soliddeviceengine.cpp
564

You can remove it, see below.

574

Capture timer too

575–576

You can omit declaration of map, just

timer->stop();
anthonyfieroni added inline comments.Aug 17 2018, 3:00 PM
dataengines/soliddevice/soliddeviceengine.cpp
588
job->start();

as well.

McPain added inline comments.Aug 17 2018, 3:05 PM
dataengines/soliddevice/soliddeviceengine.cpp
564

What if it creates another job working on the same path?

The purpose is "check if nobody is working on path", not "save the timer pointer"

588

Job starts without it

McPain updated this revision to Diff 39933.Aug 17 2018, 3:13 PM
McPain marked 4 inline comments as done.
McPain updated this revision to Diff 39934.Aug 17 2018, 3:28 PM

Much nicer! I'm definitely happy merging something like this.

dataengines/soliddevice/soliddeviceengine.cpp
555

not sure you need this timer

  1. You should still get the job finishing with error ERR_SERVER_TIMEOUT. (untested, but the protocol manager in the client does have some stuff doing this)
  1. if you don't get a progress bar for a drive that's frozen anyway, do you really care?
560

We don't know which window will be active when this timer fires. Which means the behavior to dismiss is somewhat random.

McPain added inline comments.Aug 20 2018, 6:57 AM
dataengines/soliddevice/soliddeviceengine.cpp
555

There are solutions to notify user about stuck FS without that timer, right?

560

I tested it - works for me.

davidedmundson added inline comments.Aug 20 2018, 8:58 AM
dataengines/soliddevice/soliddeviceengine.cpp
555

I think you can just move the notification to

connect(job, &KIO::FileSystemFreeSpaceJob::result, ...) {

if (job->error() == ERR_SERVER_TIMEOUT) {

....

}
}

but I haven't tested that.

560

Sure, it /might/ work, but you're still fetching a random window N seconds after a user event.

I don't see how we can know that will result in correct behaviour.

McPain planned changes to this revision.Aug 20 2018, 9:51 AM
McPain added inline comments.Aug 24 2018, 9:25 AM
dataengines/soliddevice/soliddeviceengine.cpp
560

Any ideas how to do it correctly?

anthonyfieroni added inline comments.Aug 24 2018, 9:59 AM
dataengines/soliddevice/soliddeviceengine.cpp
560

It has a default parameters just call event with 3 parameters not 5.

McPain added inline comments.Aug 27 2018, 9:15 AM
dataengines/soliddevice/soliddeviceengine.cpp
555

I did. Job will never finish in that case

McPain updated this revision to Diff 40923.Sep 3 2018, 2:50 PM

I'm not a huge fan of having that notification but if you can assure it doesn't spam the user by showing up repeatedly if a mount is blocked indefinitely, it's fine I guess

dataengines/soliddevice/soliddeviceengine.cpp
29

Unused

571

Give connect a context:

connect(job, &KIO::FileSystemFreeSpaceJob::result, this, ...

otherwise it would blow up if we are deleted before the job finishes

dataengines/soliddevice/soliddeviceengine.h
38

Unused?

McPain updated this revision to Diff 40973.Sep 4 2018, 11:41 AM
McPain marked 12 inline comments as done.
McPain marked 2 inline comments as done.
broulik accepted this revision.Sep 4 2018, 11:47 AM

Just in case, I don't have commit access

Does this look good now @davidedmundson?

davidedmundson resigned from this revision.Sep 5 2018, 6:28 AM

I wouldn't have kept the notification myself, but its fine. Ship it.

This revision is now accepted and ready to land.Sep 5 2018, 6:28 AM
ngraham resigned from this revision.Sep 5 2018, 6:37 PM

+1 for the concept. No opinion about the notification, but I think it's fine. Removing myself as a reviewer since I don't feel qualified to offer a code review. But sounds like you've got a shipit! You need someone to land it for you, right?

McPain added a subscriber: ngraham.Sep 6 2018, 6:26 AM

+1 for the concept. No opinion about the notification, but I think it's fine. Removing myself as a reviewer since I don't feel qualified to offer a code review. But sounds like you've got a shipit! You need someone to land it for you, right?

Right. Nobody gave me a commit access yet

This revision was automatically updated to reflect the committed changes.

As described on https://bugs.kde.org/show_bug.cgi?id=401088, this notification is triggering when it should not: during short bursts of high CPU load, instead of when a real issue is ongoing.

Then the handling of the high amount of notifications is extending the CPU load, possibly creating more notifications in a cycle.

Not saying to remove the feature, but I would benefit from having the default timeout of 15 seconds configurable.
If easy, I can contribute the commit, given some guiding examples. Yet I just want this not locking my machine 😬.

dataengines/soliddevice/soliddeviceengine.cpp
583–584

Can we have this number configurable somehow, even if only before startup.

It's been annoying to have notifications when bringing lots of Docker instances at once.

The overhead of _receiving and showing_ the notifications is being longer than the one about bringing the instances up.

meven added a subscriber: meven.Mar 25 2023, 2:02 PM

Feel free to send a MR, and include more context please.

dataengines/soliddevice/soliddeviceengine.cpp
583–584

Better approach would be to make this check ignore docker overlay fs.

I like the idea of making docker imune. May be improved.

dataengines/soliddevice/soliddeviceengine.cpp
583–584

That would fix my issue, sure. Idk if targeting "docker" is better than having it configurable, as other similar solutions will have the same problem.

I liked the idea of making docker imune here. May be even better by having a passlist of what should be imune, allowing the user to address issues for similar solutions

Feel free to send a MR, and include more context please.

Sorry my ignorance. What "MR" stands for, in this context?

meven added a comment.Apr 1 2023, 2:13 PM

Feel free to send a MR, and include more context please.

Sorry my ignorance. What "MR" stands for, in this context?

You are more than welcome to ask that's jargon.
MR stands for Merge Request (Gitlab's PR).