Show crash backtraces in the web UI
Open, Needs TriagePublic

Description

Sometimes a unit-test crashes on the CI, but the developer cannot reproduce it locally (happens quite often to us in KDE PIM because of the complexity of the test environment we set up and the big difference between developer environment and the env on the builder). It would be nice if Jenkins could collect coredumps from the crashes and show *resolved* backtraces in the web UI. It's not as useful as interactive debugging, but even just seeing the backtrace would be extremely useful to devs as it would make it possible for the developer identify the code that crashes and add more debugging outputs or what not.

Note that it is not enough to just publish the coredump files as they would not be applicable on the locally-built binaries. It's actually necessary to resolve the backtrace against the binary built by the CI and then show the backtrace.

dvratil created this task.Jul 29 2015, 9:58 AM
dvratil updated the task description. (Show Details)
dvratil raised the priority of this task from to Needs Triage.
dvratil added a project: build.kde.org.
dvratil moved this task to Feature Requests on the build.kde.org board.
dvratil moved this task from Feature Requests to Backlog on the build.kde.org board.
dvratil added a subscriber: dvratil.

Hi, I have not forgotten this. But I admit I am still trying to wrap my head around the request.
I am still relatively new to coding. So you want jenkins to grab the coredump from the job artifacts and run gdb on it? Then somehow output it to the console? Am I understanding correctly? Thanks for any help you can give me!

Yeah, the idea is that after a program terminates you check for a coredump file and if there's one, you run

gdb -batch -ex "thread apply all backtrace" ${PROGRAM_BINARY} ${COREDUMP_FILE} &> ${PROGRAM_NAME}.crash.log

and then publishing the .log file. The tricky part I guess is to actually find the binary for the given coredump.
I don't know how the control software on the CI works so how to actually integrate this - if you need any help, just tell me.

I know you have so much work on your plate, so many thanks for looking into this (and please note that this is nice-to-have thing, not a priority at all).

In terms of implementing this, it would be quite difficult because CTest controls the flow of test execution.
Therefore we wouldn't be able to match coredumps against test binaries....

Thoughts?

Restricted Application added a subscriber: sysadmin. · View Herald TranscriptAug 12 2017, 9:24 AM

Ping on the above?

I think my original idea was to get a backtrace from any process that crashes inside the test environment - useful when one of the many components of the Akonadi isolated tests crashes, but you are right that matching the coredumps to binaries would be difficult.

I know that QTest will at least dump a backtrace when the QTest-ed process itself crashes by installing custom signal handler and attaching GDB to itself - at least this might be interesting to have (unless it's already enabled) - all you need is gdb and ptrace permissions.

Aha. PTrace permissions might be a bit of a pain - that's blocked by limitations within the Jenkins Docker plugin we use (see T5460).
We'll probably need someone comfortable with Java to look into that and get a fix merged in upstream for us to be able to provide that i'm afraid.

Once that is done, assuming QTest automatically enables this functionality, this should "just work" (in theory) as we already install gdb for the use of KDevelop.

bcooksley changed the visibility from "All Users" to "Public (No Login Required)".Mar 26 2018, 9:32 AM
krop added a subscriber: krop.May 29 2018, 10:17 AM

That's not a solution yet, but exporting CTEST_OUTPUT_ON_FAILURE would give more hints, eg:

-yuuko- krop 12:16 /data/kde/build/akonadi # ctest -R akonadi-sqlite-etmpopulationtest
Test project /data/kde/build/akonadi
    Start 158: akonadi-sqlite-etmpopulationtest
1/1 Test #158: akonadi-sqlite-etmpopulationtest ...***Exception: Child aborted  0.20 sec

0% tests passed, 1 tests failed out of 1
-yuuko- krop 12:08 /data/kde/build/akonadi # export CTEST_OUTPUT_ON_FAILURE=1         
-yuuko- krop 12:08 /data/kde/build/akonadi # ctest -R akonadi-sqlite-etmpopulationtest
Test project /data/kde/build/akonadi
    Start 158: akonadi-sqlite-etmpopulationtest
1/1 Test #158: akonadi-sqlite-etmpopulationtest ...***Exception: Child aborted  0.20 sec
12:08:37 - akonaditest(26708) - org.kde.kcrash: KCrash::initialize: KCrash disabled through environment.
12:08:37 - akonadi-TES(26708) -  Config::readConfiguration: Base path "/data/kde/src/akonadi/autotests/libs/unittestenv/"
12:08:37 - akonadi-TES(26708) -  SetupTest::createTempEnvironment: Creating test environment in "/tmp/akonadi_testrunner-26708/"
12:08:37 - akonadi-TES(26708) -  SetupTest::copyXdgDirectory: Copying "/data/kde/src/akonadi/autotests/libs/unittestenv//xdgconfig" to "/tmp/akonadi_testrunner-26708/config"
12:08:37 - akonadi-TES(26708) -  SetupTest::copyXdgDirectory: Copying "/data/kde/src/akonadi/autotests/libs/unittestenv/xdglocal" to "/tmp/akonadi_testrunner-26708/data"
12:08:37 - akonadi-TES(26708) -  SetupTest::writeAkonadiserverrc: Written akonadiserverrc to "/tmp/akonadi_testrunner-26708/config/akonadi/instance/testrunner-26708/akonadiserverrc"
12:08:37 - akonadi-TES(26708) -  SetupTest::SetupTest: Setting environment variable "AKONADI_DISABLE_AGENT_AUTOSTART" = "true"
12:08:37 - akonadi-TES(26708) - org.kde.pim.akonadicore: Akonadi::Firstrun::Firstrun: 
12:08:37 - akonadi-TES(26708) - org.kde.pim.akonadicore: Akonadi::Firstrun::Firstrun: ("/data/kde/inst/share/akonadi/firstrun/birthdaycalendar")
12:08:37 - akonadi-TES(26708) -  : ASSERT: "Akonadi::ServerManager::hasInstanceIdentifier()" in file /data/kde/src/akonadi/autotests/libs/testrunner/setup.cpp, line 40


0% tests passed, 1 tests failed out of 1
krop added a comment.May 30 2018, 6:43 AM

Tested locally with a crashing test (a real crash, not an assert), the backtrace was displayed in the ctest output.

Is there a project known to be crashy on build.kde.org to see what happens ?

Sorry, i'm not aware of any to my immediate knowledge.
Dan, are there some PIM tests which might be relevant here?

krop added a comment.May 30 2018, 10:58 AM

All the akonadi tests fail because of the assert in my previous comment. The only crash I detected was in kwin but the CI cannot reproduce it :)