KCrash: DrKonqi cancelled = able to start...
AbandonedPublic

Authored by rjvbb on Aug 4 2018, 3:30 PM.

Details

Reviewers
sitter
Group Reviewers
Frameworks
Summary

This patch addresses a minor detail that has been irking me for a while now: the message that DrKonqi couldn't be started after the user closed that very application without filing a report.

The warning message is wrong, for evident reasons, but I think there is also no need to re-raise the signal for core-dump processing in this case. After all, the user has had all the opportunity to investigate the backtrace or even connect a debugger "manually" to the crashed process.

Test Plan

start any KF5 application and send it signal like SIGSEGV. Close DrKonqi after it appeared.

  • without the patch: KCrash claims it was unable to start DrKonqi and re-raises the signal
  • with the patch: no such claim, and no core-dump (or expensive apport processing on Ubuntu)

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 1513
Build 1531: arc lint + arc unit
rjvbb created this revision.Aug 4 2018, 3:30 PM
Restricted Application added a project: Frameworks. ยท View Herald TranscriptAug 4 2018, 3:30 PM
rjvbb requested review of this revision.Aug 4 2018, 3:30 PM
dfaure added a subscriber: dfaure.Aug 4 2018, 5:48 PM

I can't reproduce what you describe using bin/crashtest in the drkonqi repository (after making sure KDE_DEBUG=1 isn't set, of course).

When I click "Close" in drkonqi (there's no Cancel button), crashtest resumes and exits, without any error message from KCrash.

drkonqi Plasma/5.13 but master doesn't seem very different.

rjvbb added a comment.Aug 4 2018, 7:09 PM
When I click "Close" in drkonqi (there's no Cancel button), crashtest resumes and exits, without any error message from KCrash.

On my system, without the patch:

> killall -SEGV kate
> KCrash: Application 'kate' crashing...
KCrash: Attempting to start /opt/local/lib/x86_64-linux-gnu/libexec/drkonqi from kdeinit
sock_file=/run/user/505/kdeinit5__0

[2]  + Suspended (signal)            kate5
#### here I hit the close button or Escape
> QSocketNotifier: Invalid socket 9 and type 'Read', disabling...
QSocketNotifier: Invalid socket 16 and type 'Read', disabling...
Unable to start Dr. Konqi
Re-raising signal for core dump handling.
drkonqi Plasma/5.13 but master doesn't seem very different.

I just updated drkonqi to master/head, but the error handling is in KCrash.

Is it possible that your drkonqi exits with 0 when you quit it while mine somehow returns a non-zero exit code?
Note: I haven't patched drkonqi to do that - I just checked :)

rjvbb added a comment.Aug 4 2018, 7:13 PM
When I click "Close" in drkonqi (there's no Cancel button), crashtest resumes and exits, without any error message from KCrash.

On my system, without the patch:

> killall -SEGV kate
> KCrash: Application 'kate' crashing...
KCrash: Attempting to start /opt/local/lib/x86_64-linux-gnu/libexec/drkonqi from kdeinit
sock_file=/run/user/505/kdeinit5__0

[2]  + Suspended (signal)            kate5
#### here I hit the close button or Escape
> QSocketNotifier: Invalid socket 9 and type 'Read', disabling...
QSocketNotifier: Invalid socket 16 and type 'Read', disabling...
Unable to start Dr. Konqi
Re-raising signal for core dump handling.
drkonqi Plasma/5.13 but master doesn't seem very different.

I just updated drkonqi to master/head, but the error handling is in KCrash.

Is it possible that your drkonqi exits with 0 when you quit it while mine somehow returns a non-zero exit code?
Note: I haven't patched drkonqi to do that - I just checked :)

dfaure added a comment.Aug 4 2018, 9:17 PM

Tested with your killall -SEGV kate testcase, no bug here. But there's a catch, on my system I get some plasma popup on which I have to click "Report Bug" in order to get the usual drkonqi dialog. Do you get that too?

Yes, I confirm that drkonqi exits normally, says gdb (attached to drkonqi) :
[Inferior 1 (process 22013) exited normally]

rjvbb added a comment.Aug 4 2018, 9:45 PM

But there's a catch, on my system I get some plasma popup on which I have to click "Report Bug" in order to get the usual drkonqi dialog. Do you get that too?

Usually, yes. If you're not quick enough you have to go find DrKonqi in the taskbar or even systray.

Yes, I confirm that drkonqi exits normally, says gdb (attached to drkonqi) :
[Inferior 1 (process 22013) exited normally]

Well, that's a mystery... I'll have to figure out then why I get the false alarm and I don't see any other explanation then DrKonqi returning ~0.

One could say that in absolute terms my patch is correct (if DrKonqi gives an exit code that means it was started ;-)) but that's assuming that Dr.K always starts up completely when called through KCrash. I'll have to try and figure that out.

It does exit with a non-zero return code when called manually without any arguments:

%> /opt/local/lib/x86_64-linux-gnu/libexec/drkonqi
org.kde.kwindowsystem: Loaded plugin "/opt/local/share/qt5/plugins/kf5/org.kde.kwindowsystem.platforms/KF5WindowSystemX11Plugin.so" for platform "xcb"
org.kde.drkonqi: Invalid pid specified
Exit 1
%> Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'>
BrokenPipeError: [Errno 32] Broken pipe

We can probably ignore that case safely, but note the message about an exception being ignored. It comes after DrKonqi exitted, so who said that?

dfaure added a comment.Aug 4 2018, 9:49 PM
$ /d/kde/inst/kde_frameworks/lib64/libexec/drkonqi
23:47:18.361 drkonqi(23113) org.kde.drkonqi: Invalid pid specified
$ echo $?
1

if DrKonqi gives an exit code that means it was started

It could have crashed though...

rjvbb added a comment.Aug 5 2018, 8:13 AM
> if DrKonqi gives an exit code that means it was started

It could have crashed though...

Yeah, that was meant as a snarky remark - even if it crashed it was still started :)

I think there is no way at this level of interaction to determine if an application crashed after start and distinguish that from something like a "user-cancelled" return code, or is there?

dfaure added a comment.Aug 5 2018, 4:18 PM

I think there is no way at this level of interaction to determine if an application crashed after start and distinguish that from something like a "user-cancelled" return code, or is there?

There are some heuristics in kio's KProcessRunner, but aren't we getting completely sidetracked?
Step 1 figuring out if (and why) you get a non-0 exit code, step 2 fixing the bug, and only then step 3 thinking about hypothetical situations (drkonqi crashing) that we might want to handle better.

rjvbb added a comment.Aug 5 2018, 6:06 PM

Actually, it's even worse than that: I didn't double-check my assumptions about the use of DrKonqi's return/exit code. Looking at the code again there is actually no way that information is even obtained. The startProcess function simply starts DrKonqi and then waits for it to exit if so instructed; the only information obtained about the crashreporter is its PID.

And indeed, DrKonqi exits with 0 for me too when I cancel/close it.

In a way this is worse not just because my brain was cooked yesterday. It means we're getting different behaviour inside KCrash: the value of crashRecursionCounter. And I'm back to not understanding how that's possible. I don't see how crashRecursionCounter could ever be >2 unless you attempt to restart the crashed application. But that also doesn't work for me: it starts a new application = new PID = crashRecursionCounter never increases.

That counter gets set to 2 here:

if (crashRecursionCounter < 2) {
    if (s_emergencySaveFunction) {
        s_emergencySaveFunction(sig);
    }
    if ((s_flags & AutoRestart) && s_autoRestartCommand) {
        QThread::sleep(1);
        startProcess(s_autoRestartArgc, const_cast<const char **>(s_autoRestartCommandLine), false);
    }
    crashRecursionCounter++;
}

A bit strange that the counter would always be increased and not only when there has been an auto-restart attempt, but let's ignore that for now.

I see no other places where the counter is increased, so apparently the code assumes that the function itself can be called multiple times. But how? I see only 2 possibilities:

  • When the application receives an additional signal that is connected to the defaultCrashHandler() function
  • When the application is restarted without changing its PID and without re-initialising all its static variables - is that even possible (after a fatal exception)?

In other words, how do you get the crashRecursionCounter >= 4 required for not printing the unable-to-start warning?

Answer: you never get to that check:

if (!s_coreConfig->isProcess()) {
    // Only exit if we don't forward to core dumps
    _exit(253);
}

On my system:

%> cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %P

Q.E.D. ...

rjvbb abandoned this revision.Aug 5 2018, 6:19 PM

I'm abandoning this for now because my assumption was wrong even though my patch had the effect I intended. I'll reopen if/when I have a real fix, because there are still 2 issues here:

  • The inappropriate warning. This could be fixed by a version of my current patch or else by rewording the warning.
  • The somewhat unorthodox way of coredumping. /proc/sys/kernel/core_pattern is Linux-specific but the coredumpsize "limit" shell setting exits on other Unix variants too. IOW, an absent or empty core_pattern file doesn't guarantee that a core dump will NOT be created when the caught signal is re-raised. In addition, that core_pattern may be set to a crash-reporter facility (apport in my case) which is completely irrelevant for code that wasn't installed through an official distribution package (the case for all my Qt5/KF5 software).

In short, I think that what's needed here is a configuration variable rather than a check of the core_pattern special file (but that file could be used to provide the initial value of the config setting).

dfaure added a comment.Aug 5 2018, 6:32 PM

The crash recursion counter is increased when a SEGV happens inside the crash handler itself.

rjvbb added a comment.Aug 6 2018, 10:47 AM
The crash recursion counter is increased when a SEGV happens inside the crash handler itself.

Yeah, that would make sense. But then wouldn't the "unable to start" message have to be printed when the recursion counter is larger than 2?

I don't see why. We try to start drkonqi even the very first time.

rjvbb added a comment.Aug 6 2018, 11:22 AM
I don't see why. We try to start drkonqi even the very first time.

Yes, you'd print it if you're sure that first time failed (the start function got pid==0). I don't see the point in stopping to print it if you had a crash or two in the crash handler itself (supposing that crash is ever due to not being able to start DrKonqi of course).

We only stop printing it at the 4th recursion, I guess just in case that printing itself is the reason for the recursive crash...

if (crashRecursionCounter < 4) {
    fprintf(stderr, "Unable to start Dr. Konqi\n");
}
rjvbb added a comment.Aug 6 2018, 2:10 PM

I'd still like that message not to appear if I just dismissed DrKonqi myself.

But if no one else care about it I'll just patch the code for myself ...

dfaure added a comment.Aug 6 2018, 8:59 PM

I never said I don't care, I said: tell me how to reproduce this problem.

rjvbb added a comment.Aug 6 2018, 9:20 PM

OK, if no one else cares :)

I told you how to reproduce the problem: do whatever it takes to make cat /proc/sys/kernel/core_pattern return a pipe command. Without that KCrash appears to think that you won't get a coredump (wrong...) and will exit when DrKonqi exits.

As I said when I abandoned my patch, I think that the content of that (special) file should be used only as the initial value for a config setting that determines whether or not KCrash should re-raise the signal after DrKonqi exited:

  • /proc/sys/kernel/core_pattern is linux-specific (I think) and is likely to refer to the system crash reporter which shouldn't be used for software built locally (e.g. KF5 software).
  • coredumps are possible when that file does not contain a pipe command (in fact, that might be the only way to get them)