diff --git a/README.md b/README.md
index 89675f0..7811ed9 100644
--- a/README.md
+++ b/README.md
@@ -1,194 +1,194 @@
 # heaptrack - a heap memory profiler for Linux
 
 ![heaptrack_gui summary page](screenshots/gui_summary.png?raw=true "heaptrack_gui summary page")
 
 Heaptrack traces all memory allocations and annotates these events with stack traces.
 Dedicated analysis tools then allow you to interpret the heap memory profile to:
 
 - find hotspots that need to be optimized to reduce the **memory footprint** of your application
 - find **memory leaks**, i.e. locations that allocate memory which is never deallocated
 - find **allocation hotspots**, i.e. code locations that trigger a lot of memory allocation calls
 - find **temporary allocations**, which are allocations that are directly followed by their deallocation
 
 ## Using heaptrack
 
 The recommended way is to launch your application and start tracing from the beginning:
 
     heaptrack <your application and its parameters>
 
     heaptrack output will be written to "/tmp/heaptrack.APP.PID.gz"
     starting application, this might take some time...
 
     ...
 
     heaptrack stats:
         allocations:            65
         leaked allocations:     60
         temporary allocations:  1
 
     Heaptrack finished! Now run the following to investigate the data:
 
         heaptrack_gui "/tmp/heaptrack.APP.PID.gz"
 
 Alternatively, you can attach to an already running process:
 
     heaptrack --pid $(pidof <your application>)
 
     heaptrack output will be written to "/tmp/heaptrack.APP.PID.gz"
     injecting heaptrack into application via GDB, this might take some time...
     injection finished
 
     ...
 
     Heaptrack finished! Now run the following to investigate the data:
 
         heaptrack_gui "/tmp/heaptrack.APP.PID.gz"
 
 ## Building heaptrack
 
 Heaptrack is split into two parts: The data collector, i.e. `heaptrack` itself, and the
 analyzer GUI called `heaptrack_gui`. The following summarizes the dependencies for these
 two parts as they can be build independently. You will find corresponding development
 packages on all major distributions for these dependencies.
 
 On an embedded device or older Linux distribution, you will only want to build `heaptrack`.
 The data can then be analyzed on a different machine with a more modern Linux distribution
 that has access to the required GUI dependencies.
 
 If you need help with building, deploying or using heaptrack, you can contact KDAB for
 commercial support: https://www.kdab.com/software-services/workshops/profiling-workshops/
 
 ### Shared dependencies
 
 Both parts require the following tools and libraries:
 
 - cmake 2.8.9 or higher
 - a C\+\+11 enabled compiler like g\+\+ or clang\+\+
 - zlib
 - libdl
 - pthread
 - libc
 
 ### `heaptrack` dependencies
 
 The heaptrack data collector and the simplistic `heaptrack_print` analyzer depend on the
 following libraries:
 
-- boost 1.41 or higher: iostream, program_options
+- boost 1.41 or higher: iostreams, program_options
 - libunwind
 - elfutils: libdwarf
 
 For runtime-attaching, you will need `gdb` installed.
 
 ### `heaptrack_gui` dependencies
 
 The graphical user interface to interpret and analyze the data collected by heaptrack
 depends on Qt 5 and some KDE libraries:
 
 - extra-cmake-modules
 - Qt 5.2 or higher: Core, Widgets
 - KDE Frameworks 5: CoreAddons, I18n, ItemModels, ThreadWeaver, ConfigWidgets, KIO
 
 When any of these dependencies is missing, `heaptrack_gui` will not be build.
 Optionally, install the following dependencies to get additional features in
 the GUI:
 
 - KDiagram: KChart (for chart visualizations)
 
 ### Compiling
 
 Run the following commands to compile heaptrack. Do pay attention to the output
 of the CMake command, as it will tell you about missing dependencies!
 
     cd heaptrack # i.e. the source folder
     mkdir build
     cd build
     cmake -DCMAKE_BUILD_TYPE=Release .. # look for messages about missing dependencies!
     make -j$(nproc)
 
 ## Interpreting the heap profile
 
 Heaptrack generates data files that are impossible to analyze for a human. Instead, you need
 to use either `heaptrack_print` or `heaptrack_gui` to interpret the results.
 
 ### heaptrack_gui
 
 ![heaptrack_gui flamegraph page](screenshots/gui_flamegraph.png?raw=true "heaptrack_gui flamegraph page")
 
 ![heaptrack_gui allocations chart page](screenshots/gui_allocations_chart.png?raw=true "heaptrack_gui allocations chart page")
 
 The highly recommended way to analyze a heap profile is by using the `heaptrack_gui` tool.
 It depends on Qt 5 and KF 5 to graphically visualize the recorded data. It features:
 
 - a summary page of the data
 - bottom-up and top-down tree views of the code locations that allocated memory with
   their aggregated cost and stack traces
 - flame graph visualization
 - graphs of allocation costs over time
 
 ### heaptrack_print
 
 The `heaptrack_print` tool is a command line application with minimal dependencies. It takes
 the heap profile, analyzes it, and prints the results in ASCII format to the command line.
 
 In its most simple form, you can use it like this:
 
     heaptrack_print heaptrack.APP.PID.gz | less
 
 By default, the report will contain three sections:
 
     MOST CALLS TO ALLOCATION FUNCTIONS
     PEAK MEMORY CONSUMERS
     MOST TEMPORARY ALLOCATIONS
 
 Each section then lists the top ten hotspots, i.e. code locations that triggered e.g.
 the most memory allocations.
 
 Have a look at `heaptrack_print --help` for changing the output format and other options.
 
 Note that you can use this tool to convert a heaptrack data file to the Massif data format.
 You can generate a collapsed stack report for consumption by `flamegraph.pl`.
 
 ## Comparison to Valgrind's massif
 
 The idea to build heaptrack was born out of the pain in working with Valgrind's massif.
 Valgrind comes with a huge overhead in both memory and time, which sometimes prevent you
 from running it on larger real-world applications. Most of what Valgrind does is not
 needed for a simple heap profiler.
 
 ### Advantages of heaptrack over massif
 
 - *speed and memory overhead*
 
   Multi-threaded applications are not serialized when you trace them with heaptrack and
   even for single-threaded applications the overhead in both time and memory is significantly
   lower. Most notably, you only pay a price when you allocate memory -- time-intensive CPU
   calculations are not slowed down at all, contrary to what happens in Valgrind.
 
 - *more data*
 
   Valgrind's massif aggregates data before writing the report. This step loses a lot of
   useful information. Most notably, you are not longer able to find out how often memory
   was allocated, or where temporary allocations are triggered. Heaptrack does not aggregate the
   data until you interpret it, which allows for more useful insights into your allocation patterns.
 
 ### Advantages of massif over heaptrack
 
 - *ability to profile page allocations as heap*
 
   This allows you to heap-profile applications that use pool allocators that circumvent
   malloc & friends. Heaptrack can in principle also profile such applications, but it
   requires code changes to annotate the memory pool implementation.
 
 - *ability to profile stack allocations*
 
   This is inherently impossible to implement efficiently in heaptrack as far as I know.
 
 ## Contributing to heaptrack
 
 As a FOSS project, we welcome contributions of any form. You can help improve the project by:
 
 - submitting bug reports at https://bugs.kde.org/enter_bug.cgi?product=Heaptrack
 - contributing patches via https://phabricator.kde.org/dashboard/view/28/
 - translating the GUI with the help of https://l10n.kde.org/
 - writing documentation on https://userbase.kde.org/Heaptrack
diff --git a/src/analyze/gui/flamegraph.cpp b/src/analyze/gui/flamegraph.cpp
index 55088a5..144b9ed 100644
--- a/src/analyze/gui/flamegraph.cpp
+++ b/src/analyze/gui/flamegraph.cpp
@@ -1,619 +1,621 @@
 /*
  * Copyright 2015-2017 Milian Wolff <mail@milianw.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Library General Public License as
  * published by the Free Software Foundation; either version 2 of the
  * License, or (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the
  * Free Software Foundation, Inc.,
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 #include "flamegraph.h"
 
 #include <cmath>
 
 #include <QAction>
 #include <QCheckBox>
 #include <QComboBox>
 #include <QCursor>
 #include <QDebug>
 #include <QDoubleSpinBox>
 #include <QEvent>
 #include <QGraphicsRectItem>
 #include <QGraphicsScene>
 #include <QGraphicsView>
 #include <QLabel>
 #include <QStyleOption>
 #include <QToolTip>
 #include <QVBoxLayout>
 #include <QWheelEvent>
 
 #include <KColorScheme>
 #include <KLocalizedString>
 #include <KStandardAction>
 #include <ThreadWeaver/ThreadWeaver>
 
 enum CostType
 {
     Allocations,
     Temporary,
     Peak,
     Leaked,
     Allocated
 };
 Q_DECLARE_METATYPE(CostType)
 
 class FrameGraphicsItem : public QGraphicsRectItem
 {
 public:
     FrameGraphicsItem(const qint64 cost, CostType costType, const QString& function,
                       FrameGraphicsItem* parent = nullptr);
     FrameGraphicsItem(const qint64 cost, const QString& function, FrameGraphicsItem* parent);
 
     qint64 cost() const;
     void setCost(qint64 cost);
     QString function() const;
 
     void paint(QPainter* painter, const QStyleOptionGraphicsItem* option, QWidget* widget = nullptr) override;
 
     QString description() const;
 
 protected:
     void hoverEnterEvent(QGraphicsSceneHoverEvent* event) override;
     void hoverLeaveEvent(QGraphicsSceneHoverEvent* event) override;
 
 private:
     qint64 m_cost;
     QString m_function;
     CostType m_costType;
     bool m_isHovered;
 };
 
 Q_DECLARE_METATYPE(FrameGraphicsItem*)
 
 FrameGraphicsItem::FrameGraphicsItem(const qint64 cost, CostType costType, const QString& function,
                                      FrameGraphicsItem* parent)
     : QGraphicsRectItem(parent)
     , m_cost(cost)
     , m_function(function)
     , m_costType(costType)
     , m_isHovered(false)
 {
     setFlag(QGraphicsItem::ItemIsSelectable);
     setAcceptHoverEvents(true);
 }
 
 FrameGraphicsItem::FrameGraphicsItem(const qint64 cost, const QString& function, FrameGraphicsItem* parent)
     : FrameGraphicsItem(cost, parent->m_costType, function, parent)
 {
 }
 
 qint64 FrameGraphicsItem::cost() const
 {
     return m_cost;
 }
 
 void FrameGraphicsItem::setCost(qint64 cost)
 {
     m_cost = cost;
 }
 
 QString FrameGraphicsItem::function() const
 {
     return m_function;
 }
 
 void FrameGraphicsItem::paint(QPainter* painter, const QStyleOptionGraphicsItem* option, QWidget* /*widget*/)
 {
     if (isSelected() || m_isHovered) {
         auto selectedColor = brush().color();
         selectedColor.setAlpha(255);
         painter->fillRect(rect(), selectedColor);
     } else {
         painter->fillRect(rect(), brush());
     }
 
     const QPen oldPen = painter->pen();
     auto pen = oldPen;
     pen.setColor(brush().color());
     if (isSelected()) {
         pen.setWidth(2);
     }
     painter->setPen(pen);
     painter->drawRect(rect());
     painter->setPen(oldPen);
 
     const int margin = 4;
     const int width = rect().width() - 2 * margin;
     if (width < option->fontMetrics.averageCharWidth() * 6) {
         // text is too wide for the current LOD, don't paint it
         return;
     }
 
     const int height = rect().height();
 
     painter->drawText(margin + rect().x(), rect().y(), width, height,
                       Qt::AlignVCenter | Qt::AlignLeft | Qt::TextSingleLine,
                       option->fontMetrics.elidedText(m_function, Qt::ElideRight, width));
 }
 
 void FrameGraphicsItem::hoverEnterEvent(QGraphicsSceneHoverEvent* event)
 {
     QGraphicsRectItem::hoverEnterEvent(event);
     m_isHovered = true;
 }
 
 QString FrameGraphicsItem::description() const
 {
     // we build the tooltip text on demand, which is much faster than doing that
     // for potentially thousands of items when we load the data
     QString tooltip;
     KFormat format;
     qint64 totalCost = 0;
     {
         auto item = this;
         while (item->parentItem()) {
             item = static_cast<const FrameGraphicsItem*>(item->parentItem());
         }
         totalCost = item->cost();
     }
     const auto fraction = QString::number(double(m_cost) * 100. / totalCost, 'g', 3);
     const auto function = QString(QLatin1String("<span style='font-family:monospace'>") + m_function.toHtmlEscaped()
                                   + QLatin1String("</span>"));
     if (!parentItem()) {
         return function;
     }
 
     switch (m_costType) {
     case Allocations:
         tooltip = i18nc("%1: number of allocations, %2: relative number, %3: function label",
                         "%1 (%2%) allocations in %3 and below.", m_cost, fraction, function);
         break;
     case Temporary:
         tooltip = i18nc("%1: number of temporary allocations, %2: relative number, "
                         "%3 function label",
                         "%1 (%2%) temporary allocations in %3 and below.", m_cost, fraction, function);
         break;
     case Peak:
         tooltip =
             i18nc("%1: peak consumption in bytes, %2: relative number, %3: "
                   "function label",
                   "%1 (%2%) peak consumption in %3 and below.", format.formatByteSize(m_cost), fraction, function);
         break;
     case Leaked:
         tooltip = i18nc("%1: leaked bytes, %2: relative number, %3: function label", "%1 (%2%) leaked in %3 and below.",
                         format.formatByteSize(m_cost), fraction, function);
         break;
     case Allocated:
         tooltip = i18nc("%1: allocated bytes, %2: relative number, %3: function label",
                         "%1 (%2%) allocated in %3 and below.", format.formatByteSize(m_cost), fraction, function);
         break;
     }
 
     return tooltip;
 }
 
 void FrameGraphicsItem::hoverLeaveEvent(QGraphicsSceneHoverEvent* event)
 {
     QGraphicsRectItem::hoverLeaveEvent(event);
     m_isHovered = false;
 }
 
 namespace {
 
 /**
  * Generate a brush from the "mem" color space used in upstream FlameGraph.pl
  */
 QBrush brush()
 {
     // intern the brushes, to reuse them across items which can be thousands
     // otherwise we'd end up with dozens of allocations and higher memory
     // consumption
     static const QVector<QBrush> brushes = []() -> QVector<QBrush> {
         QVector<QBrush> brushes;
         std::generate_n(std::back_inserter(brushes), 100, []() {
             return QColor(0, 190 + 50 * qreal(rand()) / RAND_MAX, 210 * qreal(rand()) / RAND_MAX, 125);
         });
         return brushes;
     }();
     return brushes.at(rand() % brushes.size());
 }
 
 /**
  * Layout the flame graph and hide tiny items.
  */
 void layoutItems(FrameGraphicsItem* parent)
 {
     const auto& parentRect = parent->rect();
     const auto pos = parentRect.topLeft();
     const qreal maxWidth = parentRect.width();
     const qreal h = parentRect.height();
     const qreal y_margin = 2.;
     const qreal y = pos.y() - h - y_margin;
     qreal x = pos.x();
 
     foreach (auto child, parent->childItems()) {
         auto frameChild = static_cast<FrameGraphicsItem*>(child);
         const qreal w = maxWidth * double(frameChild->cost()) / parent->cost();
         frameChild->setVisible(w > 1);
         if (frameChild->isVisible()) {
             frameChild->setRect(QRectF(x, y, w, h));
             layoutItems(frameChild);
             x += w;
         }
     }
 }
 
 FrameGraphicsItem* findItemByFunction(const QList<QGraphicsItem*>& items, const QString& function)
 {
     foreach (auto item_, items) {
         auto item = static_cast<FrameGraphicsItem*>(item_);
         if (item->function() == function) {
             return item;
         }
     }
     return nullptr;
 }
 
 /**
  * Convert the top-down graph into a tree of FrameGraphicsItem.
  */
 void toGraphicsItems(const QVector<RowData>& data, FrameGraphicsItem* parent, int64_t AllocationData::*member,
                      const double costThreshold, bool collapseRecursion)
 {
     foreach (const auto& row, data) {
-        if (collapseRecursion && row.location->function == parent->function()) {
+        if (collapseRecursion && row.location->function != unresolvedFunctionName()
+            && row.location->function == parent->function())
+        {
             continue;
         }
         auto item = findItemByFunction(parent->childItems(), row.location->function);
         if (!item) {
             item = new FrameGraphicsItem(row.cost.*member, row.location->function, parent);
             item->setPen(parent->pen());
             item->setBrush(brush());
         } else {
             item->setCost(item->cost() + row.cost.*member);
         }
         if (item->cost() > costThreshold) {
             toGraphicsItems(row.children, item, member, costThreshold, collapseRecursion);
         }
     }
 }
 
 int64_t AllocationData::*memberForType(CostType type)
 {
     switch (type) {
     case Allocations:
         return &AllocationData::allocations;
     case Temporary:
         return &AllocationData::temporary;
     case Peak:
         return &AllocationData::peak;
     case Leaked:
         return &AllocationData::leaked;
     case Allocated:
         return &AllocationData::allocated;
     }
     Q_UNREACHABLE();
 }
 
 FrameGraphicsItem* parseData(const QVector<RowData>& topDownData, CostType type, double costThreshold,
                              bool collapseRecursion)
 {
     auto member = memberForType(type);
 
     double totalCost = 0;
     foreach (const auto& frame, topDownData) {
         totalCost += frame.cost.*member;
     }
 
     KColorScheme scheme(QPalette::Active);
     const QPen pen(scheme.foreground().color());
 
     KFormat format;
     QString label;
     switch (type) {
     case Allocations:
         label = i18n("%1 allocations in total", totalCost);
         break;
     case Temporary:
         label = i18n("%1 temporary allocations in total", totalCost);
         break;
     case Peak:
         label = i18n("%1 peak consumption in total", format.formatByteSize(totalCost));
         break;
     case Leaked:
         label = i18n("%1 leaked in total", format.formatByteSize(totalCost));
         break;
     case Allocated:
         label = i18n("%1 allocated in total", format.formatByteSize(totalCost));
         break;
     }
     auto rootItem = new FrameGraphicsItem(totalCost, type, label);
     rootItem->setBrush(scheme.background());
     rootItem->setPen(pen);
     toGraphicsItems(topDownData, rootItem, member, totalCost * costThreshold / 100., collapseRecursion);
     return rootItem;
 }
 }
 
 FlameGraph::FlameGraph(QWidget* parent, Qt::WindowFlags flags)
     : QWidget(parent, flags)
     , m_costSource(new QComboBox(this))
     , m_scene(new QGraphicsScene(this))
     , m_view(new QGraphicsView(this))
     , m_displayLabel(new QLabel)
 {
     qRegisterMetaType<FrameGraphicsItem*>();
 
     m_costSource->addItem(i18n("Allocations"), QVariant::fromValue(Allocations));
     m_costSource->setItemData(0, i18n("Show a flame graph over the number of allocations triggered by "
                                       "functions in your code."),
                               Qt::ToolTipRole);
     m_costSource->addItem(i18n("Temporary Allocations"), QVariant::fromValue(Temporary));
     m_costSource->setItemData(1, i18n("Show a flame graph over the number of temporary allocations "
                                       "triggered by functions in your code. "
                                       "Allocations are marked as temporary when they are immediately "
                                       "followed by their deallocation."),
                               Qt::ToolTipRole);
     m_costSource->addItem(i18n("Peak Consumption"), QVariant::fromValue(Peak));
     m_costSource->setItemData(2, i18n("Show a flame graph over the peak heap "
                                       "memory consumption of your application."),
                               Qt::ToolTipRole);
     m_costSource->addItem(i18n("Leaked"), QVariant::fromValue(Leaked));
     m_costSource->setItemData(3, i18n("Show a flame graph over the leaked heap memory of your application. "
                                       "Memory is considered to be leaked when it never got deallocated. "),
                               Qt::ToolTipRole);
     m_costSource->addItem(i18n("Allocated"), QVariant::fromValue(Allocated));
     m_costSource->setItemData(4, i18n("Show a flame graph over the total memory allocated by functions in "
                                       "your code. "
                                       "This aggregates all memory allocations and ignores deallocations."),
                               Qt::ToolTipRole);
     connect(m_costSource, static_cast<void (QComboBox::*)(int)>(&QComboBox::currentIndexChanged), this,
             &FlameGraph::showData);
     m_costSource->setToolTip(i18n("Select the data source that should be visualized in the flame graph."));
 
     m_scene->setItemIndexMethod(QGraphicsScene::NoIndex);
     m_view->setScene(m_scene);
     m_view->viewport()->installEventFilter(this);
     m_view->viewport()->setMouseTracking(true);
     m_view->setFont(QFont(QStringLiteral("monospace")));
 
     auto bottomUpCheckbox = new QCheckBox(i18n("Bottom-Down View"), this);
     bottomUpCheckbox->setToolTip(i18n("Enable the bottom-down flame graph view. When this is unchecked, "
                                       "the top-down view is enabled by default."));
     connect(bottomUpCheckbox, &QCheckBox::toggled, this, [this, bottomUpCheckbox] {
         m_showBottomUpData = bottomUpCheckbox->isChecked();
         showData();
     });
 
     auto collapseRecursionCheckbox = new QCheckBox(i18n("Collapse Recursion"), this);
     collapseRecursionCheckbox->setChecked(m_collapseRecursion);
     collapseRecursionCheckbox->setToolTip(i18n("Collapse stack frames for functions calling themselves. "
                                                "When this is unchecked, recursive frames will be visualized "
                                                "separately."));
     connect(collapseRecursionCheckbox, &QCheckBox::toggled, this, [this, collapseRecursionCheckbox] {
         m_collapseRecursion = collapseRecursionCheckbox->isChecked();
         showData();
     });
 
     auto costThreshold = new QDoubleSpinBox(this);
     costThreshold->setDecimals(2);
     costThreshold->setMinimum(0);
     costThreshold->setMaximum(99.90);
     costThreshold->setPrefix(i18n("Cost Threshold: "));
     costThreshold->setSuffix(QStringLiteral("%"));
     costThreshold->setValue(m_costThreshold);
     costThreshold->setSingleStep(0.01);
     costThreshold->setToolTip(i18n("<qt>The cost threshold defines a fractional cut-off value. "
                                    "Items with a relative cost below this value will not be shown in "
                                    "the flame graph. This is done as an optimization to quickly generate "
                                    "graphs for large data sets with low memory overhead. If you need more "
                                    "details, decrease the threshold value, or set it to zero.</qt>"));
     connect(costThreshold, static_cast<void (QDoubleSpinBox::*)(double)>(&QDoubleSpinBox::valueChanged), this,
             [this](double threshold) {
                 m_costThreshold = threshold;
                 showData();
             });
 
     m_displayLabel->setWordWrap(true);
     m_displayLabel->setTextInteractionFlags(m_displayLabel->textInteractionFlags() | Qt::TextSelectableByMouse);
 
     auto controls = new QWidget(this);
     controls->setLayout(new QHBoxLayout);
     controls->layout()->addWidget(m_costSource);
     controls->layout()->addWidget(bottomUpCheckbox);
     controls->layout()->addWidget(collapseRecursionCheckbox);
     controls->layout()->addWidget(costThreshold);
 
     setLayout(new QVBoxLayout);
     layout()->addWidget(controls);
     layout()->addWidget(m_view);
     layout()->addWidget(m_displayLabel);
 
     addAction(KStandardAction::back(this, SLOT(navigateBack()), this));
     addAction(KStandardAction::forward(this, SLOT(navigateForward()), this));
     setContextMenuPolicy(Qt::ActionsContextMenu);
 }
 
 FlameGraph::~FlameGraph() = default;
 
 bool FlameGraph::eventFilter(QObject* object, QEvent* event)
 {
     bool ret = QObject::eventFilter(object, event);
 
     if (event->type() == QEvent::MouseButtonRelease) {
         QMouseEvent* mouseEvent = static_cast<QMouseEvent*>(event);
         if (mouseEvent->button() == Qt::LeftButton) {
             auto item = static_cast<FrameGraphicsItem*>(m_view->itemAt(mouseEvent->pos()));
             if (item && item != m_selectionHistory.at(m_selectedItem)) {
                 selectItem(item);
                 if (m_selectedItem != m_selectionHistory.size() - 1) {
                     m_selectionHistory.remove(m_selectedItem + 1, m_selectionHistory.size() - m_selectedItem - 1);
                 }
                 m_selectedItem = m_selectionHistory.size();
                 m_selectionHistory.push_back(item);
             }
         }
     } else if (event->type() == QEvent::MouseMove) {
         QMouseEvent* mouseEvent = static_cast<QMouseEvent*>(event);
         auto item = static_cast<FrameGraphicsItem*>(m_view->itemAt(mouseEvent->pos()));
         setTooltipItem(item);
     } else if (event->type() == QEvent::Leave) {
         setTooltipItem(nullptr);
     } else if (event->type() == QEvent::Resize || event->type() == QEvent::Show) {
         if (!m_rootItem) {
             if (!m_buildingScene) {
                 showData();
             }
         } else {
             selectItem(m_selectionHistory.at(m_selectedItem));
         }
         updateTooltip();
     } else if (event->type() == QEvent::Hide) {
         setData(nullptr);
     }
     return ret;
 }
 
 void FlameGraph::setTopDownData(const TreeData& topDownData)
 {
     m_topDownData = topDownData;
 
     if (isVisible()) {
         showData();
     }
 }
 
 void FlameGraph::setBottomUpData(const TreeData& bottomUpData)
 {
     m_bottomUpData = bottomUpData;
 }
 
 void FlameGraph::clearData()
 {
     m_topDownData = {};
     m_bottomUpData = {};
 
     setData(nullptr);
 }
 
 void FlameGraph::showData()
 {
     setData(nullptr);
 
     m_buildingScene = true;
     using namespace ThreadWeaver;
     auto data = m_showBottomUpData ? m_bottomUpData : m_topDownData;
     bool collapseRecursion = m_collapseRecursion;
     auto source = m_costSource->currentData().value<CostType>();
     auto threshold = m_costThreshold;
     stream() << make_job([data, source, threshold, collapseRecursion, this]() {
         auto parsedData = parseData(data, source, threshold, collapseRecursion);
         QMetaObject::invokeMethod(this, "setData", Qt::QueuedConnection, Q_ARG(FrameGraphicsItem*, parsedData));
     });
 }
 
 void FlameGraph::setTooltipItem(const FrameGraphicsItem* item)
 {
     if (!item && m_selectedItem != -1 && m_selectionHistory.at(m_selectedItem)) {
         item = m_selectionHistory.at(m_selectedItem);
         m_view->setCursor(Qt::ArrowCursor);
     } else {
         m_view->setCursor(Qt::PointingHandCursor);
     }
     m_tooltipItem = item;
     updateTooltip();
 }
 
 void FlameGraph::updateTooltip()
 {
     const auto text = m_tooltipItem ? m_tooltipItem->description() : QString();
     m_displayLabel->setToolTip(text);
     const auto metrics = m_displayLabel->fontMetrics();
     // FIXME: the HTML text has tons of stuff that is not printed,
     //        which lets the text get cut-off too soon...
     m_displayLabel->setText(metrics.elidedText(text, Qt::ElideRight, m_displayLabel->width()));
 }
 
 void FlameGraph::setData(FrameGraphicsItem* rootItem)
 {
     m_scene->clear();
     m_buildingScene = false;
     m_rootItem = rootItem;
     m_selectionHistory.clear();
     m_selectionHistory.push_back(rootItem);
     m_selectedItem = 0;
     if (!rootItem) {
         auto text = m_scene->addText(i18n("generating flame graph..."));
         m_view->centerOn(text);
         m_view->setCursor(Qt::BusyCursor);
         return;
     }
 
     m_view->setCursor(Qt::ArrowCursor);
     // layouting needs a root item with a given height, the rest will be
     // overwritten later
     rootItem->setRect(0, 0, 800, m_view->fontMetrics().height() + 4);
     m_scene->addItem(rootItem);
 
     if (isVisible()) {
         selectItem(m_rootItem);
     }
 }
 
 void FlameGraph::selectItem(FrameGraphicsItem* item)
 {
     if (!item) {
         return;
     }
 
     // scale item and its parents to the maximum available width
     // also hide all siblings of the parent items
     const auto rootWidth = m_view->viewport()->width() - 40;
     auto parent = item;
     while (parent) {
         auto rect = parent->rect();
         rect.setLeft(0);
         rect.setWidth(rootWidth);
         parent->setRect(rect);
         if (parent->parentItem()) {
             foreach (auto sibling, parent->parentItem()->childItems()) {
                 sibling->setVisible(sibling == parent);
             }
         }
         parent = static_cast<FrameGraphicsItem*>(parent->parentItem());
     }
 
     // then layout all items below the selected on
     layoutItems(item);
 
     // and make sure it's visible
     m_view->centerOn(item);
 
     setTooltipItem(item);
 }
 
 void FlameGraph::navigateBack()
 {
     if (m_selectedItem > 0) {
         --m_selectedItem;
     }
     selectItem(m_selectionHistory.at(m_selectedItem));
 }
 
 void FlameGraph::navigateForward()
 {
     if ((m_selectedItem + 1) < m_selectionHistory.size()) {
         ++m_selectedItem;
     }
     selectItem(m_selectionHistory.at(m_selectedItem));
 }
diff --git a/src/analyze/gui/locationdata.h b/src/analyze/gui/locationdata.h
index d9d3ac5..235fc37 100644
--- a/src/analyze/gui/locationdata.h
+++ b/src/analyze/gui/locationdata.h
@@ -1,81 +1,88 @@
 /*
  * Copyright 2016-2017 Milian Wolff <mail@milianw.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Library General Public License as
  * published by the Free Software Foundation; either version 2 of the
  * License, or (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the
  * Free Software Foundation, Inc.,
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 #ifndef LOCATIONDATA_H
 #define LOCATIONDATA_H
 
 #include <QString>
 
 #include <memory>
 
 #include <boost/functional/hash.hpp>
 
+#include <KLocalizedString>
+
 struct LocationData
 {
     using Ptr = std::shared_ptr<LocationData>;
 
     QString function;
     QString file;
     QString module;
     int line;
 
     bool operator==(const LocationData& rhs) const
     {
         return function == rhs.function && file == rhs.file && module == rhs.module && line == rhs.line;
     }
 
     bool operator<(const LocationData& rhs) const
     {
         int i = function.compare(rhs.function);
         if (!i) {
             i = file.compare(rhs.file);
         }
         if (!i) {
             i = line < rhs.line ? -1 : (line > rhs.line);
         }
         if (!i) {
             i = module.compare(rhs.module);
         }
         return i < 0;
     }
 };
 Q_DECLARE_TYPEINFO(LocationData, Q_MOVABLE_TYPE);
 Q_DECLARE_METATYPE(LocationData::Ptr)
 
+inline QString unresolvedFunctionName()
+{
+    return i18n("<unresolved function>");
+}
+
 inline bool operator<(const LocationData::Ptr& lhs, const LocationData& rhs)
 {
     return *lhs < rhs;
 }
 
 inline uint qHash(const LocationData& location, uint seed_ = 0)
 {
     size_t seed = seed_;
     boost::hash_combine(seed, qHash(location.function));
     boost::hash_combine(seed, qHash(location.file));
     boost::hash_combine(seed, qHash(location.module));
     boost::hash_combine(seed, location.line);
     return seed;
 }
 
 inline uint qHash(const LocationData::Ptr& location, uint seed = 0)
 {
     return location ? qHash(*location, seed) : seed;
 }
 
 #endif // LOCATIONDATA_H
diff --git a/src/analyze/gui/parser.cpp b/src/analyze/gui/parser.cpp
index f582490..a8f1f5d 100644
--- a/src/analyze/gui/parser.cpp
+++ b/src/analyze/gui/parser.cpp
@@ -1,634 +1,601 @@
 /*
  * Copyright 2015-2017 Milian Wolff <mail@milianw.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Library General Public License as
  * published by the Free Software Foundation; either version 2 of the
  * License, or (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the
  * Free Software Foundation, Inc.,
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 #include "parser.h"
 
 #include <KLocalizedString>
 #include <ThreadWeaver/ThreadWeaver>
 
 #include <QDebug>
 
 #include "analyze/accumulatedtracedata.h"
 
 #include <future>
 #include <tuple>
 #include <vector>
 
 using namespace std;
 
 namespace {
 
 // TODO: use QString directly
 struct StringCache
 {
     QString func(const InstructionPointer& ip) const
     {
         if (ip.functionIndex) {
             // TODO: support removal of template arguments
             return stringify(ip.functionIndex);
         } else {
-            return i18n("<unresolved function>");
+            return unresolvedFunctionName();
         }
     }
 
     QString file(const InstructionPointer& ip) const
     {
         if (ip.fileIndex) {
             return stringify(ip.fileIndex);
         } else {
             return {};
         }
     }
 
     QString module(const InstructionPointer& ip) const
     {
         return stringify(ip.moduleIndex);
     }
 
     QString stringify(const StringIndex index) const
     {
         if (!index || index.index > m_strings.size()) {
             return {};
         } else {
             return m_strings.at(index.index - 1);
         }
     }
 
     LocationData::Ptr location(const IpIndex& index, const InstructionPointer& ip) const
     {
         // first try a fast index-based lookup
         auto& location = m_locationsMap[index];
         if (!location) {
             // slow-path, look for interned location
             // note that we can get the same locatoin for different IPs
             LocationData data = {func(ip), file(ip), module(ip), ip.line};
             auto it = lower_bound(m_locations.begin(), m_locations.end(), data);
             if (it != m_locations.end() && **it == data) {
                 // we got the location already from a different ip, cache it
                 location = *it;
             } else {
                 // completely new location, cache it in both containers
                 auto interned = make_shared<LocationData>(data);
                 m_locations.insert(it, interned);
                 location = interned;
             }
         }
         return location;
     }
 
     void update(const vector<string>& strings)
     {
         transform(strings.begin() + m_strings.size(), strings.end(), back_inserter(m_strings),
                   [](const string& str) { return QString::fromStdString(str); });
     }
 
     vector<QString> m_strings;
     mutable vector<LocationData::Ptr> m_locations;
     mutable QHash<IpIndex, LocationData::Ptr> m_locationsMap;
 
     bool diffMode = false;
 };
 
 struct ChartMergeData
 {
     IpIndex ip;
     qint64 consumed;
     qint64 allocations;
     qint64 allocated;
     qint64 temporary;
     bool operator<(const IpIndex rhs) const
     {
         return ip < rhs;
     }
 };
 
 const uint64_t MAX_CHART_DATAPOINTS = 500; // TODO: make this configurable via the GUI
 
 struct ParserData final : public AccumulatedTraceData
 {
     ParserData()
     {
     }
 
     void updateStringCache()
     {
         stringCache.update(strings);
     }
 
     void prepareBuildCharts()
     {
         if (stringCache.diffMode) {
             return;
         }
         consumedChartData.rows.reserve(MAX_CHART_DATAPOINTS);
         allocatedChartData.rows.reserve(MAX_CHART_DATAPOINTS);
         allocationsChartData.rows.reserve(MAX_CHART_DATAPOINTS);
         temporaryChartData.rows.reserve(MAX_CHART_DATAPOINTS);
         // start off with null data at the origin
         consumedChartData.rows.push_back({});
         allocatedChartData.rows.push_back({});
         allocationsChartData.rows.push_back({});
         temporaryChartData.rows.push_back({});
         // index 0 indicates the total row
         consumedChartData.labels[0] = i18n("total");
         allocatedChartData.labels[0] = i18n("total");
         allocationsChartData.labels[0] = i18n("total");
         temporaryChartData.labels[0] = i18n("total");
 
         buildCharts = true;
         maxConsumedSinceLastTimeStamp = 0;
         vector<ChartMergeData> merged;
         merged.reserve(instructionPointers.size());
         // merge the allocation cost by instruction pointer
         // TODO: aggregate by function instead?
         // TODO: traverse the merged call stack up until the first fork
         for (const auto& alloc : allocations) {
             const auto ip = findTrace(alloc.traceIndex).ipIndex;
             auto it = lower_bound(merged.begin(), merged.end(), ip);
             if (it == merged.end() || it->ip != ip) {
                 it = merged.insert(it, {ip, 0, 0, 0, 0});
             }
             it->consumed += alloc.peak; // we want to track the top peaks in the chart
             it->allocated += alloc.allocated;
             it->allocations += alloc.allocations;
             it->temporary += alloc.temporary;
         }
         // find the top hot spots for the individual data members and remember their
         // IP and store the label
         auto findTopChartEntries = [&](qint64 ChartMergeData::*member, int LabelIds::*label, ChartData* data) {
             sort(merged.begin(), merged.end(),
                  [=](const ChartMergeData& left, const ChartMergeData& right) { return left.*member > right.*member; });
             for (size_t i = 0; i < min(size_t(ChartRows::MAX_NUM_COST - 1), merged.size()); ++i) {
                 const auto& alloc = merged[i];
                 if (!(alloc.*member)) {
                     break;
                 }
                 const auto ip = alloc.ip;
                 (labelIds[ip].*label) = i + 1;
                 const auto function = stringCache.func(findIp(ip));
                 data->labels[i + 1] = function;
             }
         };
         findTopChartEntries(&ChartMergeData::consumed, &LabelIds::consumed, &consumedChartData);
         findTopChartEntries(&ChartMergeData::allocated, &LabelIds::allocated, &allocatedChartData);
         findTopChartEntries(&ChartMergeData::allocations, &LabelIds::allocations, &allocationsChartData);
         findTopChartEntries(&ChartMergeData::temporary, &LabelIds::temporary, &temporaryChartData);
     }
 
     void handleTimeStamp(int64_t /*oldStamp*/, int64_t newStamp)
     {
         if (!buildCharts || stringCache.diffMode) {
             return;
         }
         maxConsumedSinceLastTimeStamp = max(maxConsumedSinceLastTimeStamp, totalCost.leaked);
         const int64_t diffBetweenTimeStamps = totalTime / MAX_CHART_DATAPOINTS;
         if (newStamp != totalTime && newStamp - lastTimeStamp < diffBetweenTimeStamps) {
             return;
         }
         const auto nowConsumed = maxConsumedSinceLastTimeStamp;
         maxConsumedSinceLastTimeStamp = 0;
         lastTimeStamp = newStamp;
 
         // create the rows
         auto createRow = [](int64_t timeStamp, int64_t totalCost) {
             ChartRows row;
             row.timeStamp = timeStamp;
             row.cost[0] = totalCost;
             return row;
         };
         auto consumed = createRow(newStamp, nowConsumed);
         auto allocated = createRow(newStamp, totalCost.allocated);
         auto allocs = createRow(newStamp, totalCost.allocations);
         auto temporary = createRow(newStamp, totalCost.temporary);
 
         // if the cost is non-zero and the ip corresponds to a hotspot function
         // selected in the labels,
         // we add the cost to the rows column
         auto addDataToRow = [](int64_t cost, int labelId, ChartRows* rows) {
             if (!cost || labelId == -1) {
                 return;
             }
             rows->cost[labelId] += cost;
         };
         for (const auto& alloc : allocations) {
             const auto ip = findTrace(alloc.traceIndex).ipIndex;
             auto it = labelIds.constFind(ip);
             if (it == labelIds.constEnd()) {
                 continue;
             }
             const auto& labelIds = *it;
             addDataToRow(alloc.leaked, labelIds.consumed, &consumed);
             addDataToRow(alloc.allocated, labelIds.allocated, &allocated);
             addDataToRow(alloc.allocations, labelIds.allocations, &allocs);
             addDataToRow(alloc.temporary, labelIds.temporary, &temporary);
         }
         // add the rows for this time stamp
         consumedChartData.rows << consumed;
         allocatedChartData.rows << allocated;
         allocationsChartData.rows << allocs;
         temporaryChartData.rows << temporary;
     }
 
     void handleAllocation(const AllocationInfo& info, const AllocationIndex index)
     {
         maxConsumedSinceLastTimeStamp = max(maxConsumedSinceLastTimeStamp, totalCost.leaked);
 
         if (index.index == allocationInfoCounter.size()) {
             allocationInfoCounter.push_back({info, 1});
         } else {
             ++allocationInfoCounter[index.index].allocations;
         }
     }
 
     void handleDebuggee(const char* command)
     {
         debuggee = command;
     }
 
     string debuggee;
 
     struct CountedAllocationInfo
     {
         AllocationInfo info;
         int64_t allocations;
         bool operator<(const CountedAllocationInfo& rhs) const
         {
             return tie(info.size, allocations) < tie(rhs.info.size, rhs.allocations);
         }
     };
     vector<CountedAllocationInfo> allocationInfoCounter;
 
     ChartData consumedChartData;
     ChartData allocationsChartData;
     ChartData allocatedChartData;
     ChartData temporaryChartData;
     // here we store the indices into ChartRows::cost for those IpIndices that
     // are within the top hotspots. This way, we can do one hash lookup in the
     // handleTimeStamp function instead of three when we'd store this data
     // in a per-ChartData hash.
     struct LabelIds
     {
         int consumed = -1;
         int allocations = -1;
         int allocated = -1;
         int temporary = -1;
     };
     QHash<IpIndex, LabelIds> labelIds;
     int64_t maxConsumedSinceLastTimeStamp = 0;
     int64_t lastTimeStamp = 0;
 
     StringCache stringCache;
 
     bool buildCharts = false;
 };
 
 void setParents(QVector<RowData>& children, const RowData* parent)
 {
     for (auto& row : children) {
         row.parent = parent;
         setParents(row.children, &row);
     }
 }
 
 TreeData mergeAllocations(const ParserData& data)
 {
     TreeData topRows;
     // merge allocations, leave parent pointers invalid (their location may change)
     for (const auto& allocation : data.allocations) {
         auto traceIndex = allocation.traceIndex;
         auto rows = &topRows;
         while (traceIndex) {
             const auto& trace = data.findTrace(traceIndex);
             const auto& ip = data.findIp(trace.ipIndex);
             auto location = data.stringCache.location(trace.ipIndex, ip);
 
             auto it = lower_bound(rows->begin(), rows->end(), location);
             if (it != rows->end() && it->location == location) {
                 it->cost += allocation;
             } else {
                 it = rows->insert(it, {allocation, location, nullptr, {}});
             }
             if (data.isStopIndex(ip.functionIndex)) {
                 break;
             }
             traceIndex = trace.parentIndex;
             rows = &it->children;
         }
     }
     // now set the parents, the data is constant from here on
     setParents(topRows, nullptr);
 
     return topRows;
 }
 
 RowData* findByLocation(const RowData& row, QVector<RowData>* data)
 {
     for (int i = 0; i < data->size(); ++i) {
         if (data->at(i).location == row.location) {
             return data->data() + i;
         }
     }
     return nullptr;
 }
 
 AllocationData buildTopDown(const TreeData& bottomUpData, TreeData* topDownData)
 {
     AllocationData totalCost;
     for (const auto& row : bottomUpData) {
         // recurse and find the cost attributed to children
         const auto childCost = buildTopDown(row.children, topDownData);
         if (childCost != row.cost) {
             // this row is (partially) a leaf
             const auto cost = row.cost - childCost;
 
             // bubble up the parent chain to build a top-down tree
             auto node = &row;
             auto stack = topDownData;
             while (node) {
                 auto data = findByLocation(*node, stack);
                 if (!data) {
                     // create an empty top-down item for this bottom-up node
                     *stack << RowData{{}, node->location, nullptr, {}};
                     data = &stack->back();
                 }
                 // always use the leaf node's cost and propagate that one up the chain
                 // otherwise we'd count the cost of some nodes multiple times
                 data->cost += cost;
                 stack = &data->children;
                 node = node->parent;
             }
         }
         totalCost += row.cost;
     }
     return totalCost;
 }
 
 QVector<RowData> toTopDownData(const QVector<RowData>& bottomUpData)
 {
     QVector<RowData> topRows;
     buildTopDown(bottomUpData, &topRows);
     // now set the parents, the data is constant from here on
     setParents(topRows, nullptr);
     return topRows;
 }
 
-void buildCallerCallee2(const TreeData& bottomUpData, CallerCalleeRows* callerCalleeData)
-{
-    foreach (const auto& row, bottomUpData) {
-        if (row.children.isEmpty()) {
-            // leaf node found, bubble up the parent chain to add cost for all frames
-            // to the caller/callee data. this is done top-down since we must not count
-            // locations more than once in the caller-callee data
-            QSet<LocationData::Ptr> recursionGuard;
-
-            auto node = &row;
-            while (node) {
-                const auto& location = node->location;
-                if (!recursionGuard.contains(location)) { // aggregate caller-callee data
-                    auto it = lower_bound(callerCalleeData->begin(), callerCalleeData->end(), location,
-                        [](const CallerCalleeData& lhs, const LocationData::Ptr& rhs) { return lhs.location < rhs; });
-                    if (it == callerCalleeData->end() || it->location != location) {
-                        it = callerCalleeData->insert(it, {{}, {}, location});
-                    }
-                    it->inclusiveCost += row.cost;
-                    if (!node->parent) {
-                        it->selfCost += row.cost;
-                    }
-                    recursionGuard.insert(location);
-                }
-                node = node->parent;
-            }
-        } else {
-            // recurse to find a leaf
-            buildCallerCallee2(row.children, callerCalleeData);
-        }
-    }
-}
-
 AllocationData buildCallerCallee(const TreeData& bottomUpData, CallerCalleeRows* callerCalleeData)
 {
     AllocationData totalCost;
     for (const auto& row : bottomUpData) {
         // recurse to find a leaf
         const auto childCost = buildCallerCallee(row.children, callerCalleeData);
         if (childCost != row.cost) {
             // this row is (partially) a leaf
             const auto cost = row.cost - childCost;
 
             // leaf node found, bubble up the parent chain to add cost for all frames
             // to the caller/callee data. this is done top-down since we must not count
             // symbols more than once in the caller-callee data
             QSet<LocationData::Ptr> recursionGuard;
 
             auto node = &row;
             while (node) {
                 const auto& location = node->location;
                 if (!recursionGuard.contains(location)) { // aggregate caller-callee data
                     auto it = lower_bound(callerCalleeData->begin(), callerCalleeData->end(), location,
                         [](const CallerCalleeData& lhs, const LocationData::Ptr& rhs) { return lhs.location < rhs; });
                     if (it == callerCalleeData->end() || it->location != location) {
                         it = callerCalleeData->insert(it, {{}, {}, location});
                     }
                     it->inclusiveCost += cost;
                     if (!node->parent) {
                         it->selfCost += cost;
                     }
                     recursionGuard.insert(location);
                 }
                 node = node->parent;
             }
         }
         totalCost += row.cost;
     }
     return totalCost;
 }
 
 CallerCalleeRows toCallerCalleeData(const QVector<RowData>& bottomUpData, bool diffMode)
 {
     CallerCalleeRows callerCalleeRows;
 
     buildCallerCallee(bottomUpData, &callerCalleeRows);
 
     if (diffMode) {
         // remove rows without cost
         callerCalleeRows.erase(remove_if(callerCalleeRows.begin(), callerCalleeRows.end(),
                                          [](const CallerCalleeData& data) -> bool {
                                              return data.inclusiveCost == AllocationData()
                                                  && data.selfCost == AllocationData();
                                          }),
                                callerCalleeRows.end());
     }
 
     return callerCalleeRows;
 }
 
 struct MergedHistogramColumnData
 {
     LocationData::Ptr location;
     int64_t allocations;
     bool operator<(const LocationData::Ptr& rhs) const
     {
         return location < rhs;
     }
 };
 
 HistogramData buildSizeHistogram(ParserData& data)
 {
     HistogramData ret;
     if (data.allocationInfoCounter.empty()) {
         return ret;
     }
     sort(data.allocationInfoCounter.begin(), data.allocationInfoCounter.end());
     const auto totalLabel = i18n("total");
     HistogramRow row;
     const pair<uint64_t, QString> buckets[] = {{8, i18n("0B to 8B")},
                                                {16, i18n("9B to 16B")},
                                                {32, i18n("17B to 32B")},
                                                {64, i18n("33B to 64B")},
                                                {128, i18n("65B to 128B")},
                                                {256, i18n("129B to 256B")},
                                                {512, i18n("257B to 512B")},
                                                {1024, i18n("512B to 1KB")},
                                                {numeric_limits<uint64_t>::max(), i18n("more than 1KB")}};
     uint bucketIndex = 0;
     row.size = buckets[bucketIndex].first;
     row.sizeLabel = buckets[bucketIndex].second;
     vector<MergedHistogramColumnData> columnData;
     columnData.reserve(128);
     auto insertColumns = [&]() {
         sort(columnData.begin(), columnData.end(),
              [](const MergedHistogramColumnData& lhs, const MergedHistogramColumnData& rhs) {
                  return lhs.allocations > rhs.allocations;
              });
         // -1 to account for total row
         for (size_t i = 0; i < min(columnData.size(), size_t(HistogramRow::NUM_COLUMNS - 1)); ++i) {
             const auto& column = columnData[i];
             row.columns[i + 1] = {column.allocations, column.location};
         }
     };
     for (const auto& info : data.allocationInfoCounter) {
         if (info.info.size > row.size) {
             insertColumns();
             columnData.clear();
             ret << row;
             ++bucketIndex;
             row.size = buckets[bucketIndex].first;
             row.sizeLabel = buckets[bucketIndex].second;
             row.columns[0] = {info.allocations, {}};
         } else {
             row.columns[0].allocations += info.allocations;
         }
         const auto ipIndex = data.findTrace(info.info.traceIndex).ipIndex;
         const auto ip = data.findIp(ipIndex);
         const auto location = data.stringCache.location(ipIndex, ip);
         auto it = lower_bound(columnData.begin(), columnData.end(), location);
         if (it == columnData.end() || it->location != location) {
             columnData.insert(it, {location, info.allocations});
         } else {
             it->allocations += info.allocations;
         }
     }
     insertColumns();
     ret << row;
     return ret;
 }
 }
 
 Parser::Parser(QObject* parent)
     : QObject(parent)
 {
     qRegisterMetaType<SummaryData>();
 }
 
 Parser::~Parser() = default;
 
 void Parser::parse(const QString& path, const QString& diffBase)
 {
     using namespace ThreadWeaver;
     stream() << make_job([this, path, diffBase]() {
         const auto stdPath = path.toStdString();
         auto data = make_shared<ParserData>();
         emit progressMessageAvailable(i18n("parsing data..."));
 
         if (!diffBase.isEmpty()) {
             ParserData diffData;
             auto readBase =
                 async(launch::async, [&diffData, diffBase]() { return diffData.read(diffBase.toStdString()); });
             if (!data->read(stdPath)) {
                 emit failedToOpen(path);
                 return;
             }
             if (!readBase.get()) {
                 emit failedToOpen(diffBase);
                 return;
             }
             data->diff(diffData);
             data->stringCache.diffMode = true;
         } else if (!data->read(stdPath)) {
             emit failedToOpen(path);
             return;
         }
 
         data->updateStringCache();
 
         emit summaryAvailable({QString::fromStdString(data->debuggee), data->totalCost, data->totalTime, data->peakTime,
                                data->peakRSS * data->systemInfo.pageSize,
                                data->systemInfo.pages * data->systemInfo.pageSize, data->fromAttached});
 
         emit progressMessageAvailable(i18n("merging allocations..."));
         // merge allocations before modifying the data again
         const auto mergedAllocations = mergeAllocations(*data);
         emit bottomUpDataAvailable(mergedAllocations);
 
         // also calculate the size histogram
         emit progressMessageAvailable(i18n("building size histogram..."));
         const auto sizeHistogram = buildSizeHistogram(*data);
         emit sizeHistogramDataAvailable(sizeHistogram);
         // now data can be modified again for the chart data evaluation
 
         const auto diffMode = data->stringCache.diffMode;
         emit progressMessageAvailable(i18n("building charts..."));
         auto parallel = new Collection;
         *parallel << make_job([this, mergedAllocations]() {
             const auto topDownData = toTopDownData(mergedAllocations);
             emit topDownDataAvailable(topDownData);
         }) << make_job([this, mergedAllocations, diffMode]() {
             const auto callerCalleeData = toCallerCalleeData(mergedAllocations, diffMode);
             emit callerCalleeDataAvailable(callerCalleeData);
         });
         if (!data->stringCache.diffMode) {
             // only build charts when we are not diffing
             *parallel << make_job([this, data, stdPath]() {
                 // this mutates data, and thus anything running in parallel must
                 // not access data
                 data->prepareBuildCharts();
                 data->read(stdPath);
                 emit consumedChartDataAvailable(data->consumedChartData);
                 emit allocationsChartDataAvailable(data->allocationsChartData);
                 emit allocatedChartDataAvailable(data->allocatedChartData);
                 emit temporaryChartDataAvailable(data->temporaryChartData);
             });
         }
 
         auto sequential = new Sequence;
         *sequential << parallel << make_job([this]() { emit finished(); });
 
         stream() << sequential;
     });
 }
diff --git a/src/interpret/heaptrack_interpret.cpp b/src/interpret/heaptrack_interpret.cpp
index 33e4c33..98477ce 100644
--- a/src/interpret/heaptrack_interpret.cpp
+++ b/src/interpret/heaptrack_interpret.cpp
@@ -1,430 +1,430 @@
 /*
  * Copyright 2014-2017 Milian Wolff <mail@milianw.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Library General Public License as
  * published by the Free Software Foundation; either version 2 of the
  * License, or (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the
  * Free Software Foundation, Inc.,
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 /**
  * @file heaptrack_interpret.cpp
  *
  * @brief Interpret raw heaptrack data and add Dwarf based debug information.
  */
 
 #include <algorithm>
 #include <cinttypes>
 #include <iostream>
 #include <sstream>
 #include <stdio_ext.h>
 #include <tuple>
 #include <unordered_map>
 #include <vector>
 
 #include <cxxabi.h>
 
 #include <boost/algorithm/string/predicate.hpp>
 
 #include "libbacktrace/backtrace.h"
 #include "libbacktrace/internal.h"
 #include "util/linereader.h"
 #include "util/pointermap.h"
 
 using namespace std;
 
 namespace {
 
 string demangle(const char* function)
 {
     if (!function) {
         return {};
     } else if (function[0] != '_' || function[1] != 'Z') {
         return {function};
     }
 
     string ret;
     int status = 0;
     char* demangled = abi::__cxa_demangle(function, 0, 0, &status);
     if (demangled) {
         ret = demangled;
         free(demangled);
     }
     return ret;
 }
 
 struct AddressInformation
 {
     string function;
     string file;
     int line = 0;
 };
 
 struct Module
 {
     Module(uintptr_t addressStart, uintptr_t addressEnd, backtrace_state* backtraceState, size_t moduleIndex)
         : addressStart(addressStart)
         , addressEnd(addressEnd)
         , moduleIndex(moduleIndex)
         , backtraceState(backtraceState)
     {
     }
 
     AddressInformation resolveAddress(uintptr_t address) const
     {
         AddressInformation info;
         if (!backtraceState) {
             return info;
         }
 
         backtrace_pcinfo(backtraceState, address,
                          [](void* data, uintptr_t /*addr*/, const char* file, int line, const char* function) -> int {
                              auto info = reinterpret_cast<AddressInformation*>(data);
                              info->function = demangle(function);
                              info->file = file ? file : "";
                              info->line = line;
                              return 0;
                          },
                          [](void* /*data*/, const char* /*msg*/, int /*errnum*/) {}, &info);
 
         if (info.function.empty()) {
             backtrace_syminfo(
                 backtraceState, address,
                 [](void* data, uintptr_t /*pc*/, const char* symname, uintptr_t /*symval*/, uintptr_t /*symsize*/) {
                     if (symname) {
                         reinterpret_cast<AddressInformation*>(data)->function = demangle(symname);
                     }
                 },
                 [](void* /*data*/, const char* msg, int errnum) {
                     cerr << "Module backtrace error (code " << errnum << "): " << msg << endl;
                 },
                 &info);
         }
 
         return info;
     }
 
     bool operator<(const Module& module) const
     {
         return tie(addressStart, addressEnd, moduleIndex)
             < tie(module.addressStart, module.addressEnd, module.moduleIndex);
     }
 
     bool operator!=(const Module& module) const
     {
         return tie(addressStart, addressEnd, moduleIndex)
             != tie(module.addressStart, module.addressEnd, module.moduleIndex);
     }
 
     uintptr_t addressStart;
     uintptr_t addressEnd;
     size_t moduleIndex;
     backtrace_state* backtraceState;
 };
 
 struct ResolvedIP
 {
     size_t moduleIndex = 0;
     size_t fileIndex = 0;
     size_t functionIndex = 0;
     int line = 0;
 };
 
 struct AccumulatedTraceData
 {
     AccumulatedTraceData()
     {
         m_modules.reserve(256);
         m_backtraceStates.reserve(64);
         m_internedData.reserve(4096);
         m_encounteredIps.reserve(32768);
     }
 
     ~AccumulatedTraceData()
     {
         fprintf(stdout, "# strings: %zu\n# ips: %zu\n", m_internedData.size(), m_encounteredIps.size());
     }
 
     ResolvedIP resolve(const uintptr_t ip)
     {
         if (m_modulesDirty) {
             // sort by addresses, required for binary search below
             sort(m_modules.begin(), m_modules.end());
 
 #ifndef NDEBUG
             for (size_t i = 0; i < m_modules.size(); ++i) {
                 const auto& m1 = m_modules[i];
                 for (size_t j = i + 1; j < m_modules.size(); ++j) {
                     if (i == j) {
                         continue;
                     }
                     const auto& m2 = m_modules[j];
                     if ((m1.addressStart <= m2.addressStart && m1.addressEnd > m2.addressStart)
                         || (m1.addressStart < m2.addressEnd && m1.addressEnd >= m2.addressEnd)) {
                         cerr << "OVERLAPPING MODULES: " << hex << m1.moduleIndex << " (" << m1.addressStart << " to "
                              << m1.addressEnd << ") and " << m1.moduleIndex << " (" << m2.addressStart << " to "
                              << m2.addressEnd << ")\n"
                              << dec;
                     } else if (m2.addressStart >= m1.addressEnd) {
                         break;
                     }
                 }
             }
 #endif
 
             m_modulesDirty = false;
         }
 
         ResolvedIP data;
         // find module for this instruction pointer
         auto module =
             lower_bound(m_modules.begin(), m_modules.end(), ip,
                         [](const Module& module, const uintptr_t ip) -> bool { return module.addressEnd < ip; });
         if (module != m_modules.end() && module->addressStart <= ip && module->addressEnd >= ip) {
             data.moduleIndex = module->moduleIndex;
             const auto info = module->resolveAddress(ip);
             data.fileIndex = intern(info.file);
             data.functionIndex = intern(info.function);
             data.line = info.line;
         }
         return data;
     }
 
     size_t intern(const string& str, const char** internedString = nullptr)
     {
         if (str.empty()) {
             return 0;
         }
 
         auto it = m_internedData.find(str);
         if (it != m_internedData.end()) {
             if (internedString) {
                 *internedString = it->first.c_str();
             }
             return it->second;
         }
         const size_t id = m_internedData.size() + 1;
         it = m_internedData.insert(it, make_pair(str, id));
         if (internedString) {
             *internedString = it->first.c_str();
         }
         fprintf(stdout, "s %s\n", str.c_str());
         return id;
     }
 
     void addModule(backtrace_state* backtraceState, const size_t moduleIndex, const uintptr_t addressStart,
                    const uintptr_t addressEnd)
     {
         m_modules.emplace_back(addressStart, addressEnd, backtraceState, moduleIndex);
         m_modulesDirty = true;
     }
 
     void clearModules()
     {
         // TODO: optimize this, reuse modules that are still valid
         m_modules.clear();
         m_modulesDirty = true;
     }
 
     size_t addIp(const uintptr_t instructionPointer)
     {
         if (!instructionPointer) {
             return 0;
         }
 
         auto it = m_encounteredIps.find(instructionPointer);
         if (it != m_encounteredIps.end()) {
             return it->second;
         }
 
         const size_t ipId = m_encounteredIps.size() + 1;
         m_encounteredIps.insert(it, make_pair(instructionPointer, ipId));
 
         const auto ip = resolve(instructionPointer);
         fprintf(stdout, "i %zx %zx", instructionPointer, ip.moduleIndex);
         if (ip.functionIndex || ip.fileIndex) {
             fprintf(stdout, " %zx", ip.functionIndex);
             if (ip.fileIndex) {
                 fprintf(stdout, " %zx %x", ip.fileIndex, ip.line);
             }
         }
         fputc('\n', stdout);
         return ipId;
     }
 
     /**
      * Prevent the same file from being initialized multiple times.
      * This drastically cuts the memory consumption down
      */
     backtrace_state* findBacktraceState(const char* fileName, uintptr_t addressStart)
     {
         if (boost::algorithm::starts_with(fileName, "linux-vdso.so")) {
             // prevent warning, since this will always fail
             return nullptr;
         }
 
         auto it = m_backtraceStates.find(fileName);
         if (it != m_backtraceStates.end()) {
             return it->second;
         }
 
         struct CallbackData
         {
             const char* fileName;
         };
         CallbackData data = {fileName};
 
         auto errorHandler = [](void* rawData, const char* msg, int errnum) {
             auto data = reinterpret_cast<const CallbackData*>(rawData);
             cerr << "Failed to create backtrace state for module " << data->fileName << ": " << msg << " / "
                  << strerror(errnum) << " (error code " << errnum << ")" << endl;
         };
 
         auto state = backtrace_create_state(fileName, /* we are single threaded, so: not thread safe */ false,
                                             errorHandler, &data);
 
         if (state) {
             const int descriptor = backtrace_open(fileName, errorHandler, &data, nullptr);
             if (descriptor >= 1) {
                 int foundSym = 0;
                 int foundDwarf = 0;
                 auto ret = elf_add(state, descriptor, addressStart, errorHandler, &data, &state->fileline_fn, &foundSym,
                                    &foundDwarf, false);
                 if (ret && foundSym) {
                     state->syminfo_fn = &elf_syminfo;
                 }
             }
         }
 
         m_backtraceStates.insert(it, make_pair(fileName, state));
 
         return state;
     }
 
 private:
     vector<Module> m_modules;
     unordered_map<const char*, backtrace_state*> m_backtraceStates;
     bool m_modulesDirty = false;
 
     unordered_map<string, size_t> m_internedData;
     unordered_map<uintptr_t, size_t> m_encounteredIps;
 };
 }
 
 int main(int /*argc*/, char** /*argv*/)
 {
     // optimize: we only have a single thread
     ios_base::sync_with_stdio(false);
     __fsetlocking(stdout, FSETLOCKING_BYCALLER);
     __fsetlocking(stdin, FSETLOCKING_BYCALLER);
 
     AccumulatedTraceData data;
 
     LineReader reader;
 
     string exe;
 
     PointerMap ptrToIndex;
     uint64_t lastPtr = 0;
     AllocationInfoSet allocationInfos;
 
     uint64_t allocations = 0;
     uint64_t leakedAllocations = 0;
     uint64_t temporaryAllocations = 0;
 
     while (reader.getLine(cin)) {
         if (reader.mode() == 'x') {
             reader >> exe;
         } else if (reader.mode() == 'm') {
             string fileName;
             reader >> fileName;
             if (fileName == "-") {
                 data.clearModules();
             } else {
                 if (fileName == "x") {
                     fileName = exe;
                 }
                 const char* internedString = nullptr;
                 const auto moduleIndex = data.intern(fileName, &internedString);
                 uintptr_t addressStart = 0;
                 if (!(reader >> addressStart)) {
                     cerr << "failed to parse line: " << reader.line() << endl;
                     return 1;
                 }
                 auto state = data.findBacktraceState(internedString, addressStart);
                 uintptr_t vAddr = 0;
                 uintptr_t memSize = 0;
                 while ((reader >> vAddr) && (reader >> memSize)) {
                     data.addModule(state, moduleIndex, addressStart + vAddr, addressStart + vAddr + memSize);
                 }
             }
         } else if (reader.mode() == 't') {
             uintptr_t instructionPointer = 0;
             size_t parentIndex = 0;
             if (!(reader >> instructionPointer) || !(reader >> parentIndex)) {
                 cerr << "failed to parse line: " << reader.line() << endl;
                 return 1;
             }
             // ensure ip is encountered
             const auto ipId = data.addIp(instructionPointer);
             // trace point, map current output index to parent index
             fprintf(stdout, "t %zx %zx\n", ipId, parentIndex);
         } else if (reader.mode() == '+') {
             ++allocations;
             ++leakedAllocations;
             uint64_t size = 0;
             TraceIndex traceId;
             uint64_t ptr = 0;
             if (!(reader >> size) || !(reader >> traceId.index) || !(reader >> ptr)) {
                 cerr << "failed to parse line: " << reader.line() << endl;
                 continue;
             }
 
             AllocationIndex index;
             if (allocationInfos.add(size, traceId, &index)) {
                 fprintf(stdout, "a %" PRIx64 " %x\n", size, traceId.index);
             }
             ptrToIndex.addPointer(ptr, index);
             lastPtr = ptr;
             fprintf(stdout, "+ %x\n", index.index);
         } else if (reader.mode() == '-') {
-            --leakedAllocations;
             uint64_t ptr = 0;
             if (!(reader >> ptr)) {
                 cerr << "failed to parse line: " << reader.line() << endl;
                 continue;
             }
             bool temporary = lastPtr == ptr;
             lastPtr = 0;
             auto allocation = ptrToIndex.takePointer(ptr);
             if (!allocation.second) {
                 continue;
             }
             fprintf(stdout, "- %x\n", allocation.first.index);
             if (temporary) {
                 ++temporaryAllocations;
             }
+            --leakedAllocations;
         } else {
             fputs(reader.line().c_str(), stdout);
             fputc('\n', stdout);
         }
     }
 
     fprintf(stderr, "heaptrack stats:\n"
                     "\tallocations:          \t%" PRIu64 "\n"
                     "\tleaked allocations:   \t%" PRIu64 "\n"
                     "\ttemporary allocations:\t%" PRIu64 "\n",
             allocations, leakedAllocations, temporaryAllocations);
 
     return 0;
 }
diff --git a/src/track/heaptrack.sh.cmake b/src/track/heaptrack.sh.cmake
index e015a92..e4b61de 100755
--- a/src/track/heaptrack.sh.cmake
+++ b/src/track/heaptrack.sh.cmake
@@ -1,187 +1,199 @@
 #!/bin/sh
 
 #
 # Copyright 2014-2017 Milian Wolff <mail@milianw.de>
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU Library General Public License as
 # published by the Free Software Foundation; either version 2 of the
 # License, or (at your option) any later version.
 #
 # This program is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public
 # License along with this program; if not, write to the
 # Free Software Foundation, Inc.,
 # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 #
 
 usage() {
     echo "Usage: $0 [--debug|-d] DEBUGGEE [ARGUMENT]..."
     echo "or:    $0 [--debug|-d] -p PID"
     echo
     echo "A heap memory usage profiler. It uses LD_PRELOAD to track all"
     echo "calls to the core memory allocation functions and logs these"
     echo "occurrances. Additionally, backtraces are obtained and logged."
     echo "Combined this can give interesting answers to questions such as:"
     echo
     echo "  * How much heap memory is my application using?"
     echo "  * Where is heap memory being allocated, and how often?"
     echo "  * How much space are heap individual allocations requesting?"
     echo
     echo "To evaluate the generated heaptrack data, use heaptrack_print or heaptrack_gui."
     echo
     echo "Mandatory arguments to heaptrack:"
     echo "  DEBUGGEE       The name or path to the application that should"
     echo "                 be run with heaptrack analyzation enabled."
     echo
     echo "Alternatively, to attach to a running process:"
     echo "  -p, --pid PID  The process ID of a running process into which"
     echo "                 heaptrack will be injected. This only works with"
     echo "                 applications that already link against libdl."
+    echo "  WARNING: Runtime-attaching heaptrack is UNSTABLE and can lead to CRASHES"
+    echo "           in your application, especially after you detach heaptrack again."
+    echo "           You are hereby warned, use it at your own risk!"
     echo
     echo "Optional arguments to heaptrack:"
     echo "  -d, --debug    Run the debuggee in GDB and heaptrack."
     echo "  ARGUMENT       Any number of arguments that will be passed verbatim"
     echo "                 to the debuggee."
     echo "  -h, --help     Show this help message and exit."
     echo "  -v, --version  Displays version information."
     echo
     exit 0
 }
 
 debug=
 pid=
 client=
 
 while true; do
     case "$1" in
         "-d" | "--debug")
             debug=1
             shift 1
             ;;
         "-h" | "--help")
             usage
             exit 0
             ;;
         "-p" | "--pid")
             pid=$2
             if [ -z "$pid" ]; then
                 echo "Missing PID argument."
                 exit 1
             fi
             client=$(cat /proc/$pid/comm)
             if [ -z "$client" ]; then
                 echo "Cannot attach to unknown process with PID $pid."
                 exit 1
             fi
             shift 2
             echo $@
             if [ ! -z "$@" ]; then
                 echo "You cannot specify a debuggee and a pid at the same time."
                 exit 1
             fi
             break
             ;;
         "-v" | "--version")
             echo "heaptrack @HEAPTRACK_VERSION_MAJOR@.@HEAPTRACK_VERSION_MINOR@.@HEAPTRACK_VERSION_PATCH@"
             exit 0
             ;;
         *)
             if [ "$1" = "--" ]; then
                 shift 1
             fi
             if [ ! -x "$(which "$1" 2> /dev/null)" ]; then
                 echo "Error: Debuggee \"$1\" is not an executable."
                 echo
                 echo "Usage: $0 [--debug|-d] [--help|-h] DEBUGGEE [ARGS...]"
                 exit 1
             fi
             client="$1"
             shift 1
             break
             ;;
     esac
 done
 
 # put output into current pwd
 output=$(pwd)/heaptrack.$(basename "$client").$$
 
 # find preload library and interpreter executable using relative paths
 EXE_PATH=$(readlink -f $(dirname $(readlink -f $0)))
 LIB_REL_PATH="@LIB_REL_PATH@"
 LIBEXEC_REL_PATH="@LIBEXEC_REL_PATH@"
 
 INTERPRETER="$EXE_PATH/$LIBEXEC_REL_PATH/heaptrack_interpret"
 if [ ! -f "$INTERPRETER" ]; then
     echo "Could not find heaptrack interpreter executable: $INTERPRETER"
     exit 1
 fi
 INTERPRETER=$(readlink -f "$INTERPRETER")
 
 LIBHEAPTRACK_PRELOAD="$EXE_PATH/$LIB_REL_PATH/libheaptrack_preload.so"
 if [ ! -f "$LIBHEAPTRACK_PRELOAD" ]; then
     echo "Could not find heaptrack preload library $LIBHEAPTRACK_PRELOAD"
     exit 1
 fi
 LIBHEAPTRACK_PRELOAD=$(readlink -f "$LIBHEAPTRACK_PRELOAD")
 
 LIBHEAPTRACK_INJECT="$EXE_PATH/$LIB_REL_PATH/libheaptrack_inject.so"
 if [ ! -f "$LIBHEAPTRACK_INJECT" ]; then
     echo "Could not find heaptrack inject library $LIBHEAPTRACK_INJECT"
     exit 1
 fi
 LIBHEAPTRACK_INJECT=$(readlink -f "$LIBHEAPTRACK_INJECT")
 
 # setup named pipe to read data from
 pipe=/tmp/heaptrack_fifo$$
 mkfifo $pipe
 
 # interpret the data and compress the output on the fly
 output="$output.gz"
 "$INTERPRETER" < $pipe | gzip -c > "$output" &
 debuggee=$!
 
 cleanup() {
+    if [ ! -z "$pid" ]; then
+        echo "removing heaptrack injection via GDB, this might take some time..."
+        gdb --batch-silent -n -iex="set auto-solib-add off" -p $pid \
+            --eval-command="sharedlibrary libheaptrack_inject" \
+            --eval-command="call heaptrack_stop()" \
+            --eval-command="detach"
+        # NOTE: we do not call dlclose here, as that has the tendency to trigger
+        #       crashes in the debuggee. So instead, we keep heaptrack loaded.
+    fi
     rm -f "$pipe"
     kill "$debuggee" 2> /dev/null
 
     echo "Heaptrack finished! Now run the following to investigate the data:"
     echo
     if [ "$(which heaptrack_gui 2> /dev/null)" != "" ]; then
         echo "  heaptrack_gui \"$output\""
     else
         echo "  heaptrack_print \"$output\" | less"
     fi
 }
 trap cleanup EXIT
 
 echo "heaptrack output will be written to \"$output\""
 
 if [ -z "$debug" ] && [ -z "$pid" ]; then
   echo "starting application, this might take some time..."
   LD_PRELOAD=$LIBHEAPTRACK_PRELOAD${LD_PRELOAD:+:$LD_PRELOAD} DUMP_HEAPTRACK_OUTPUT="$pipe" "$client" "$@"
 else
   if [ -z "$pid" ]; then
     echo "starting application in GDB, this might take some time..."
     gdb --eval-command="set environment LD_PRELOAD=$LIBHEAPTRACK_PRELOAD" \
         --eval-command="set environment DUMP_HEAPTRACK_OUTPUT=$pipe" \
         --eval-command="run" --args "$client" "$@"
   else
     echo "injecting heaptrack into application via GDB, this might take some time..."
     gdb --batch-silent -n -iex="set auto-solib-add off" -p $pid \
         --eval-command="sharedlibrary libc.so" \
         --eval-command="call (void) __libc_dlopen_mode(\"$LIBHEAPTRACK_INJECT\", 0x80000000 | 0x002)" \
         --eval-command="sharedlibrary libheaptrack_inject" \
         --eval-command="call (void) heaptrack_inject(\"$pipe\")" \
         --eval-command="detach"
     echo "injection finished"
   fi
 fi
 
 wait $debuggee
 
 # kate: hl Bash
diff --git a/src/track/libheaptrack.cpp b/src/track/libheaptrack.cpp
index 20e4383..b21b00b 100644
--- a/src/track/libheaptrack.cpp
+++ b/src/track/libheaptrack.cpp
@@ -1,637 +1,651 @@
 /*
  * Copyright 2014-2017 Milian Wolff <mail@milianw.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU Library General Public License as
  * published by the Free Software Foundation; either version 2 of the
  * License, or (at your option) any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the
  * Free Software Foundation, Inc.,
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 /**
  * @file libheaptrack.cpp
  *
  * @brief Collect raw heaptrack data by overloading heap allocation functions.
  */
 
 #include "libheaptrack.h"
 
 #include <cstdio>
 #include <cstdlib>
 #include <fcntl.h>
 #include <link.h>
 #include <stdio_ext.h>
 
 #include <atomic>
 #include <cinttypes>
 #include <memory>
 #include <mutex>
 #include <string>
 #include <thread>
 #include <unordered_set>
 
 #include <boost/algorithm/string/replace.hpp>
 
 #include "tracetree.h"
 #include "util/config.h"
 #include "util/libunwind_config.h"
 
 /**
  * uncomment this to get extended debug code for known pointers
  * there are still some malloc functions I'm missing apparently,
  * related to TLS and such I guess
  */
 // #define DEBUG_MALLOC_PTRS
 
 using namespace std;
 
 namespace {
 
 enum DebugVerbosity
 {
     NoDebugOutput,
     MinimalOutput,
     VerboseOutput,
     VeryVerboseOutput,
 };
 
 // change this to add more debug output to stderr
 constexpr const DebugVerbosity s_debugVerbosity = NoDebugOutput;
 
 /**
  * Call this to optionally show debug information but give the compiler
  * a hand in removing it all if debug output is disabled.
  */
 template <DebugVerbosity debugLevel, typename... Args>
 inline void debugLog(const char fmt[], Args... args)
 {
     if (debugLevel <= s_debugVerbosity) {
         flockfile(stderr);
         fprintf(stderr, "heaptrack debug [%d]: ", static_cast<int>(debugLevel));
         fprintf(stderr, fmt, args...);
         fputc('\n', stderr);
         funlockfile(stderr);
     }
 }
 
 /**
  * Set to true in an atexit handler. In such conditions, the stop callback
  * will not be called.
  */
 atomic<bool> s_atexit{false};
 
+/**
+ * Set to true in heaptrack_stop, when s_atexit was not yet set. In such conditions,
+ * we always fully unload and cleanup behind ourselves
+ */
+atomic<bool> s_forceCleanup{false};
+
 /**
  * A per-thread handle guard to prevent infinite recursion, which should be
  * acquired before doing any special symbol handling.
  */
 struct RecursionGuard
 {
     RecursionGuard()
         : wasLocked(isActive)
     {
         isActive = true;
     }
 
     ~RecursionGuard()
     {
         isActive = wasLocked;
     }
 
     const bool wasLocked;
     static thread_local bool isActive;
 };
 
 thread_local bool RecursionGuard::isActive = false;
 
 void writeVersion(FILE* out)
 {
     fprintf(out, "v %x %x\n", HEAPTRACK_VERSION, HEAPTRACK_FILE_FORMAT_VERSION);
 }
 
 void writeExe(FILE* out)
 {
     const int BUF_SIZE = 1023;
     char buf[BUF_SIZE + 1];
     ssize_t size = readlink("/proc/self/exe", buf, BUF_SIZE);
     if (size > 0 && size < BUF_SIZE) {
         buf[size] = 0;
         fprintf(out, "x %s\n", buf);
     }
 }
 
 void writeCommandLine(FILE* out)
 {
     fputc('X', out);
     const int BUF_SIZE = 4096;
     char buf[BUF_SIZE + 1];
     auto fd = open("/proc/self/cmdline", O_RDONLY);
     int bytesRead = read(fd, buf, BUF_SIZE);
     char* end = buf + bytesRead;
     for (char* p = buf; p < end;) {
         fputc(' ', out);
         fputs(p, out);
         while (*p++)
             ; // skip until start of next 0-terminated section
     }
 
     close(fd);
     fputc('\n', out);
 }
 
 void writeSystemInfo(FILE* out)
 {
     fprintf(out, "I %lx %lx\n", sysconf(_SC_PAGESIZE), sysconf(_SC_PHYS_PAGES));
 }
 
 FILE* createFile(const char* fileName)
 {
     string outputFileName;
     if (fileName) {
         outputFileName.assign(fileName);
     }
 
     if (outputFileName == "-" || outputFileName == "stdout") {
         debugLog<VerboseOutput>("%s", "will write to stdout");
         return stdout;
     } else if (outputFileName == "stderr") {
         debugLog<VerboseOutput>("%s", "will write to stderr");
         return stderr;
     }
 
     if (outputFileName.empty()) {
         // env var might not be set when linked directly into an executable
         outputFileName = "heaptrack.$$";
     }
 
     boost::replace_all(outputFileName, "$$", to_string(getpid()));
 
     auto out = fopen(outputFileName.c_str(), "w");
     debugLog<VerboseOutput>("will write to %s/%p\n", outputFileName.c_str(), out);
     // we do our own locking, this speeds up the writing significantly
     __fsetlocking(out, FSETLOCKING_BYCALLER);
     return out;
 }
 
 /**
  * Thread-Safe heaptrack API
  *
  * The only critical section in libheaptrack is the output of the data,
  * dl_iterate_phdr
  * calls, as well as initialization and shutdown.
  *
  * This uses a spinlock, instead of a std::mutex, as the latter can lead to
  * deadlocks
  * on destruction. The spinlock is "simple", and OK to only guard the small
  * sections.
  */
 class HeapTrack
 {
 public:
     HeapTrack(const RecursionGuard& /*recursionGuard*/)
         : HeapTrack([] { return true; })
     {
     }
 
     ~HeapTrack()
     {
         debugLog<VeryVerboseOutput>("%s", "releasing lock");
         s_locked.store(false, memory_order_release);
     }
 
     void initialize(const char* fileName, heaptrack_callback_t initBeforeCallback,
                     heaptrack_callback_initialized_t initAfterCallback, heaptrack_callback_t stopCallback)
     {
         debugLog<MinimalOutput>("initializing: %s", fileName);
         if (s_data) {
             debugLog<MinimalOutput>("%s", "already initialized");
             return;
         }
 
         if (initBeforeCallback) {
             debugLog<MinimalOutput>("%s", "calling initBeforeCallback");
             initBeforeCallback();
             debugLog<MinimalOutput>("%s", "done calling initBeforeCallback");
         }
 
         // do some once-only initializations
         static once_flag once;
         call_once(once, [] {
             debugLog<MinimalOutput>("%s", "doing once-only initialization");
             // configure libunwind for better speed
             if (unw_set_caching_policy(unw_local_addr_space, UNW_CACHE_PER_THREAD)) {
                 fprintf(stderr, "WARNING: Failed to enable per-thread libunwind caching.\n");
             }
 #ifdef unw_set_cache_size
             if (unw_set_cache_size(unw_local_addr_space, 1024, 0)) {
                 fprintf(stderr, "WARNING: Failed to set libunwind cache size.\n");
             }
 #endif
 
             // do not trace forked child processes
             // TODO: make this configurable
             pthread_atfork(&prepare_fork, &parent_fork, &child_fork);
 
             atexit([]() {
+                if (s_forceCleanup) {
+                    return;
+                }
                 debugLog<MinimalOutput>("%s", "atexit()");
                 s_atexit.store(true);
                 heaptrack_stop();
             });
         });
 
         FILE* out = createFile(fileName);
 
         if (!out) {
             fprintf(stderr, "ERROR: Failed to open heaptrack output file: %s\n", fileName);
             if (stopCallback) {
                 stopCallback();
             }
             return;
         }
 
         writeVersion(out);
         writeExe(out);
         writeCommandLine(out);
         writeSystemInfo(out);
 
         s_data = new LockedData(out, stopCallback);
 
         if (initAfterCallback) {
             debugLog<MinimalOutput>("%s", "calling initAfterCallback");
             initAfterCallback(out);
             debugLog<MinimalOutput>("%s", "calling initAfterCallback done");
         }
 
         debugLog<MinimalOutput>("%s", "initialization done");
     }
 
     void shutdown()
     {
         if (!s_data) {
             return;
         }
 
         debugLog<MinimalOutput>("%s", "shutdown()");
 
         writeTimestamp();
         writeRSS();
 
         // NOTE: we leak heaptrack data on exit, intentionally
         // This way, we can be sure to get all static deallocations.
-        if (!s_atexit) {
+        if (!s_atexit || s_forceCleanup) {
             delete s_data;
             s_data = nullptr;
         }
 
         debugLog<MinimalOutput>("%s", "shutdown() done");
     }
 
     void invalidateModuleCache()
     {
         if (!s_data) {
             return;
         }
         s_data->moduleCacheDirty = true;
     }
 
     void writeTimestamp()
     {
         if (!s_data || !s_data->out) {
             return;
         }
 
         auto elapsed = chrono::duration_cast<chrono::milliseconds>(clock::now() - s_data->start);
 
         debugLog<VeryVerboseOutput>("writeTimestamp(%" PRIx64 ")", elapsed.count());
 
         if (fprintf(s_data->out, "c %" PRIx64 "\n", elapsed.count()) < 0) {
             writeError();
             return;
         }
     }
 
     void writeRSS()
     {
         if (!s_data || !s_data->out || !s_data->procStatm) {
             return;
         }
 
         // read RSS in pages from statm, then rewind for next read
         size_t rss = 0;
         fscanf(s_data->procStatm, "%*x %zx", &rss);
         rewind(s_data->procStatm);
         // TODO: compare to rusage.ru_maxrss (getrusage) to find "real" peak?
         // TODO: use custom allocators with known page sizes to prevent tainting
         //       the RSS numbers with heaptrack-internal data
 
         if (fprintf(s_data->out, "R %zx\n", rss) < 0) {
             writeError();
             return;
         }
     }
 
     void handleMalloc(void* ptr, size_t size, const Trace& trace)
     {
         if (!s_data || !s_data->out) {
             return;
         }
         updateModuleCache();
         const auto index = s_data->traceTree.index(trace, s_data->out);
 
 #ifdef DEBUG_MALLOC_PTRS
         auto it = s_data->known.find(ptr);
         assert(it == s_data->known.end());
         s_data->known.insert(ptr);
 #endif
 
         if (fprintf(s_data->out, "+ %zx %x %" PRIxPTR "\n", size, index, reinterpret_cast<uintptr_t>(ptr)) < 0) {
             writeError();
             return;
         }
     }
 
     void handleFree(void* ptr)
     {
         if (!s_data || !s_data->out) {
             return;
         }
 
 #ifdef DEBUG_MALLOC_PTRS
         auto it = s_data->known.find(ptr);
         assert(it != s_data->known.end());
         s_data->known.erase(it);
 #endif
 
         if (fprintf(s_data->out, "- %" PRIxPTR "\n", reinterpret_cast<uintptr_t>(ptr)) < 0) {
             writeError();
             return;
         }
     }
 
 private:
     static int dl_iterate_phdr_callback(struct dl_phdr_info* info, size_t /*size*/, void* data)
     {
         auto heaptrack = reinterpret_cast<HeapTrack*>(data);
         const char* fileName = info->dlpi_name;
         if (!fileName || !fileName[0]) {
             fileName = "x";
         }
 
         debugLog<VerboseOutput>("dlopen_notify_callback: %s %zx", fileName, info->dlpi_addr);
 
         if (fprintf(heaptrack->s_data->out, "m %s %zx", fileName, info->dlpi_addr) < 0) {
             heaptrack->writeError();
             return 1;
         }
 
         for (int i = 0; i < info->dlpi_phnum; i++) {
             const auto& phdr = info->dlpi_phdr[i];
             if (phdr.p_type == PT_LOAD) {
                 if (fprintf(heaptrack->s_data->out, " %zx %zx", phdr.p_vaddr, phdr.p_memsz) < 0) {
                     heaptrack->writeError();
                     return 1;
                 }
             }
         }
 
         if (fputc('\n', heaptrack->s_data->out) == EOF) {
             heaptrack->writeError();
             return 1;
         }
 
         return 0;
     }
 
     static void prepare_fork()
     {
         debugLog<MinimalOutput>("%s", "prepare_fork()");
         // don't do any custom malloc handling while inside fork
         RecursionGuard::isActive = true;
     }
 
     static void parent_fork()
     {
         debugLog<MinimalOutput>("%s", "parent_fork()");
         // the parent process can now continue its custom malloc tracking
         RecursionGuard::isActive = false;
     }
 
     static void child_fork()
     {
         debugLog<MinimalOutput>("%s", "child_fork()");
         // but the forked child process cleans up itself
         // this is important to prevent two processes writing to the same file
         s_data = nullptr;
         RecursionGuard::isActive = true;
     }
 
     void updateModuleCache()
     {
         if (!s_data || !s_data->out || !s_data->moduleCacheDirty) {
             return;
         }
         debugLog<MinimalOutput>("%s", "updateModuleCache()");
         if (fputs("m -\n", s_data->out) == EOF) {
             writeError();
             return;
         }
         dl_iterate_phdr(&dl_iterate_phdr_callback, this);
         s_data->moduleCacheDirty = false;
     }
 
     void writeError()
     {
         debugLog<MinimalOutput>("write error %d/%s", errno, strerror(errno));
         s_data->out = nullptr;
         shutdown();
     }
 
     template <typename AdditionalLockCheck>
     HeapTrack(AdditionalLockCheck lockCheck)
     {
         debugLog<VeryVerboseOutput>("%s", "acquiring lock");
         while (s_locked.exchange(true, memory_order_acquire) && lockCheck()) {
             this_thread::sleep_for(chrono::microseconds(1));
         }
         debugLog<VeryVerboseOutput>("%s", "lock acquired");
     }
 
     using clock = chrono::steady_clock;
 
     struct LockedData
     {
         LockedData(FILE* out, heaptrack_callback_t stopCallback)
             : out(out)
             , stopCallback(stopCallback)
         {
             debugLog<MinimalOutput>("%s", "constructing LockedData");
             procStatm = fopen("/proc/self/statm", "r");
             if (!procStatm) {
                 fprintf(stderr, "WARNING: Failed to open /proc/self/statm for reading.\n");
             }
             timerThread = thread([&]() {
                 RecursionGuard::isActive = true;
                 debugLog<MinimalOutput>("%s", "timer thread started");
                 while (!stopTimerThread) {
                     // TODO: make interval customizable
                     this_thread::sleep_for(chrono::milliseconds(10));
 
                     HeapTrack heaptrack([&] { return !stopTimerThread.load(); });
                     if (!stopTimerThread) {
                         heaptrack.writeTimestamp();
                         heaptrack.writeRSS();
                     }
                 }
             });
         }
 
         ~LockedData()
         {
             debugLog<MinimalOutput>("%s", "destroying LockedData");
             stopTimerThread = true;
             if (timerThread.joinable()) {
                 try {
                     timerThread.join();
                 } catch (std::system_error) {
                 }
             }
 
             if (out) {
                 fclose(out);
             }
 
             if (procStatm) {
                 fclose(procStatm);
             }
 
-            if (stopCallback && !s_atexit) {
+            if (stopCallback && (!s_atexit || s_forceCleanup)) {
                 stopCallback();
             }
             debugLog<MinimalOutput>("%s", "done destroying LockedData");
         }
 
         /**
          * Note: We use the C stdio API here for performance reasons.
          *       Esp. in multi-threaded environments this is much faster
          *       to produce non-per-line-interleaved output.
          */
         FILE* out = nullptr;
 
         /// /proc/self/statm file stream to read RSS value from
         FILE* procStatm = nullptr;
 
         /**
          * Calls to dlopen/dlclose mark the cache as dirty.
          * When this happened, all modules and their section addresses
          * must be found again via dl_iterate_phdr before we output the
          * next instruction pointer. Otherwise, heaptrack_interpret might
          * encounter IPs of an unknown/invalid module.
          */
         bool moduleCacheDirty = true;
 
         TraceTree traceTree;
 
         const chrono::time_point<clock> start = clock::now();
         atomic<bool> stopTimerThread{false};
         thread timerThread;
 
         heaptrack_callback_t stopCallback = nullptr;
 
 #ifdef DEBUG_MALLOC_PTRS
         unordered_set<void*> known;
 #endif
     };
 
     static atomic<bool> s_locked;
     static LockedData* s_data;
 };
 
 atomic<bool> HeapTrack::s_locked{false};
 HeapTrack::LockedData* HeapTrack::s_data{nullptr};
 }
 extern "C" {
 
 void heaptrack_init(const char* outputFileName, heaptrack_callback_t initBeforeCallback,
                     heaptrack_callback_initialized_t initAfterCallback, heaptrack_callback_t stopCallback)
 {
     RecursionGuard guard;
 
     debugLog<MinimalOutput>("heaptrack_init(%s)", outputFileName);
 
     HeapTrack heaptrack(guard);
     heaptrack.initialize(outputFileName, initBeforeCallback, initAfterCallback, stopCallback);
 }
 
 void heaptrack_stop()
 {
     RecursionGuard guard;
 
     debugLog<MinimalOutput>("%s", "heaptrack_stop()");
 
     HeapTrack heaptrack(guard);
+
+    if (!s_atexit) {
+        s_forceCleanup.store(true);
+    }
+
     heaptrack.shutdown();
 }
 
 void heaptrack_malloc(void* ptr, size_t size)
 {
     if (ptr && !RecursionGuard::isActive) {
         RecursionGuard guard;
 
         debugLog<VeryVerboseOutput>("heaptrack_malloc(%p, %zu)", ptr, size);
 
         Trace trace;
         trace.fill(2 + HEAPTRACK_DEBUG_BUILD);
 
         HeapTrack heaptrack(guard);
         heaptrack.handleMalloc(ptr, size, trace);
     }
 }
 
 void heaptrack_free(void* ptr)
 {
     if (ptr && !RecursionGuard::isActive) {
         RecursionGuard guard;
 
         debugLog<VeryVerboseOutput>("heaptrack_free(%p)", ptr);
 
         HeapTrack heaptrack(guard);
         heaptrack.handleFree(ptr);
     }
 }
 
 void heaptrack_realloc(void* ptr_in, size_t size, void* ptr_out)
 {
     if (ptr_out && !RecursionGuard::isActive) {
         RecursionGuard guard;
 
         debugLog<VeryVerboseOutput>("heaptrack_realloc(%p, %zu, %p)", ptr_in, size, ptr_out);
 
         Trace trace;
         trace.fill(2 + HEAPTRACK_DEBUG_BUILD);
 
         HeapTrack heaptrack(guard);
         if (ptr_in) {
             heaptrack.handleFree(ptr_in);
         }
         heaptrack.handleMalloc(ptr_out, size, trace);
     }
 }
 
 void heaptrack_invalidate_module_cache()
 {
     RecursionGuard guard;
 
     debugLog<VerboseOutput>("%s", "heaptrack_invalidate_module_cache()");
 
     HeapTrack heaptrack(guard);
     heaptrack.invalidateModuleCache();
 }
 }