diff --git a/README.md b/README.md index 89675f0..7811ed9 100644 --- a/README.md +++ b/README.md @@ -1,194 +1,194 @@ # heaptrack - a heap memory profiler for Linux ![heaptrack_gui summary page](screenshots/gui_summary.png?raw=true "heaptrack_gui summary page") Heaptrack traces all memory allocations and annotates these events with stack traces. Dedicated analysis tools then allow you to interpret the heap memory profile to: - find hotspots that need to be optimized to reduce the **memory footprint** of your application - find **memory leaks**, i.e. locations that allocate memory which is never deallocated - find **allocation hotspots**, i.e. code locations that trigger a lot of memory allocation calls - find **temporary allocations**, which are allocations that are directly followed by their deallocation ## Using heaptrack The recommended way is to launch your application and start tracing from the beginning: heaptrack heaptrack output will be written to "/tmp/heaptrack.APP.PID.gz" starting application, this might take some time... ... heaptrack stats: allocations: 65 leaked allocations: 60 temporary allocations: 1 Heaptrack finished! Now run the following to investigate the data: heaptrack_gui "/tmp/heaptrack.APP.PID.gz" Alternatively, you can attach to an already running process: heaptrack --pid $(pidof ) heaptrack output will be written to "/tmp/heaptrack.APP.PID.gz" injecting heaptrack into application via GDB, this might take some time... injection finished ... Heaptrack finished! Now run the following to investigate the data: heaptrack_gui "/tmp/heaptrack.APP.PID.gz" ## Building heaptrack Heaptrack is split into two parts: The data collector, i.e. `heaptrack` itself, and the analyzer GUI called `heaptrack_gui`. The following summarizes the dependencies for these two parts as they can be build independently. You will find corresponding development packages on all major distributions for these dependencies. On an embedded device or older Linux distribution, you will only want to build `heaptrack`. The data can then be analyzed on a different machine with a more modern Linux distribution that has access to the required GUI dependencies. If you need help with building, deploying or using heaptrack, you can contact KDAB for commercial support: https://www.kdab.com/software-services/workshops/profiling-workshops/ ### Shared dependencies Both parts require the following tools and libraries: - cmake 2.8.9 or higher - a C\+\+11 enabled compiler like g\+\+ or clang\+\+ - zlib - libdl - pthread - libc ### `heaptrack` dependencies The heaptrack data collector and the simplistic `heaptrack_print` analyzer depend on the following libraries: -- boost 1.41 or higher: iostream, program_options +- boost 1.41 or higher: iostreams, program_options - libunwind - elfutils: libdwarf For runtime-attaching, you will need `gdb` installed. ### `heaptrack_gui` dependencies The graphical user interface to interpret and analyze the data collected by heaptrack depends on Qt 5 and some KDE libraries: - extra-cmake-modules - Qt 5.2 or higher: Core, Widgets - KDE Frameworks 5: CoreAddons, I18n, ItemModels, ThreadWeaver, ConfigWidgets, KIO When any of these dependencies is missing, `heaptrack_gui` will not be build. Optionally, install the following dependencies to get additional features in the GUI: - KDiagram: KChart (for chart visualizations) ### Compiling Run the following commands to compile heaptrack. Do pay attention to the output of the CMake command, as it will tell you about missing dependencies! cd heaptrack # i.e. the source folder mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release .. # look for messages about missing dependencies! make -j$(nproc) ## Interpreting the heap profile Heaptrack generates data files that are impossible to analyze for a human. Instead, you need to use either `heaptrack_print` or `heaptrack_gui` to interpret the results. ### heaptrack_gui ![heaptrack_gui flamegraph page](screenshots/gui_flamegraph.png?raw=true "heaptrack_gui flamegraph page") ![heaptrack_gui allocations chart page](screenshots/gui_allocations_chart.png?raw=true "heaptrack_gui allocations chart page") The highly recommended way to analyze a heap profile is by using the `heaptrack_gui` tool. It depends on Qt 5 and KF 5 to graphically visualize the recorded data. It features: - a summary page of the data - bottom-up and top-down tree views of the code locations that allocated memory with their aggregated cost and stack traces - flame graph visualization - graphs of allocation costs over time ### heaptrack_print The `heaptrack_print` tool is a command line application with minimal dependencies. It takes the heap profile, analyzes it, and prints the results in ASCII format to the command line. In its most simple form, you can use it like this: heaptrack_print heaptrack.APP.PID.gz | less By default, the report will contain three sections: MOST CALLS TO ALLOCATION FUNCTIONS PEAK MEMORY CONSUMERS MOST TEMPORARY ALLOCATIONS Each section then lists the top ten hotspots, i.e. code locations that triggered e.g. the most memory allocations. Have a look at `heaptrack_print --help` for changing the output format and other options. Note that you can use this tool to convert a heaptrack data file to the Massif data format. You can generate a collapsed stack report for consumption by `flamegraph.pl`. ## Comparison to Valgrind's massif The idea to build heaptrack was born out of the pain in working with Valgrind's massif. Valgrind comes with a huge overhead in both memory and time, which sometimes prevent you from running it on larger real-world applications. Most of what Valgrind does is not needed for a simple heap profiler. ### Advantages of heaptrack over massif - *speed and memory overhead* Multi-threaded applications are not serialized when you trace them with heaptrack and even for single-threaded applications the overhead in both time and memory is significantly lower. Most notably, you only pay a price when you allocate memory -- time-intensive CPU calculations are not slowed down at all, contrary to what happens in Valgrind. - *more data* Valgrind's massif aggregates data before writing the report. This step loses a lot of useful information. Most notably, you are not longer able to find out how often memory was allocated, or where temporary allocations are triggered. Heaptrack does not aggregate the data until you interpret it, which allows for more useful insights into your allocation patterns. ### Advantages of massif over heaptrack - *ability to profile page allocations as heap* This allows you to heap-profile applications that use pool allocators that circumvent malloc & friends. Heaptrack can in principle also profile such applications, but it requires code changes to annotate the memory pool implementation. - *ability to profile stack allocations* This is inherently impossible to implement efficiently in heaptrack as far as I know. ## Contributing to heaptrack As a FOSS project, we welcome contributions of any form. You can help improve the project by: - submitting bug reports at https://bugs.kde.org/enter_bug.cgi?product=Heaptrack - contributing patches via https://phabricator.kde.org/dashboard/view/28/ - translating the GUI with the help of https://l10n.kde.org/ - writing documentation on https://userbase.kde.org/Heaptrack diff --git a/src/analyze/gui/flamegraph.cpp b/src/analyze/gui/flamegraph.cpp index 55088a5..144b9ed 100644 --- a/src/analyze/gui/flamegraph.cpp +++ b/src/analyze/gui/flamegraph.cpp @@ -1,619 +1,621 @@ /* * Copyright 2015-2017 Milian Wolff * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this program; if not, write to the * Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include "flamegraph.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include enum CostType { Allocations, Temporary, Peak, Leaked, Allocated }; Q_DECLARE_METATYPE(CostType) class FrameGraphicsItem : public QGraphicsRectItem { public: FrameGraphicsItem(const qint64 cost, CostType costType, const QString& function, FrameGraphicsItem* parent = nullptr); FrameGraphicsItem(const qint64 cost, const QString& function, FrameGraphicsItem* parent); qint64 cost() const; void setCost(qint64 cost); QString function() const; void paint(QPainter* painter, const QStyleOptionGraphicsItem* option, QWidget* widget = nullptr) override; QString description() const; protected: void hoverEnterEvent(QGraphicsSceneHoverEvent* event) override; void hoverLeaveEvent(QGraphicsSceneHoverEvent* event) override; private: qint64 m_cost; QString m_function; CostType m_costType; bool m_isHovered; }; Q_DECLARE_METATYPE(FrameGraphicsItem*) FrameGraphicsItem::FrameGraphicsItem(const qint64 cost, CostType costType, const QString& function, FrameGraphicsItem* parent) : QGraphicsRectItem(parent) , m_cost(cost) , m_function(function) , m_costType(costType) , m_isHovered(false) { setFlag(QGraphicsItem::ItemIsSelectable); setAcceptHoverEvents(true); } FrameGraphicsItem::FrameGraphicsItem(const qint64 cost, const QString& function, FrameGraphicsItem* parent) : FrameGraphicsItem(cost, parent->m_costType, function, parent) { } qint64 FrameGraphicsItem::cost() const { return m_cost; } void FrameGraphicsItem::setCost(qint64 cost) { m_cost = cost; } QString FrameGraphicsItem::function() const { return m_function; } void FrameGraphicsItem::paint(QPainter* painter, const QStyleOptionGraphicsItem* option, QWidget* /*widget*/) { if (isSelected() || m_isHovered) { auto selectedColor = brush().color(); selectedColor.setAlpha(255); painter->fillRect(rect(), selectedColor); } else { painter->fillRect(rect(), brush()); } const QPen oldPen = painter->pen(); auto pen = oldPen; pen.setColor(brush().color()); if (isSelected()) { pen.setWidth(2); } painter->setPen(pen); painter->drawRect(rect()); painter->setPen(oldPen); const int margin = 4; const int width = rect().width() - 2 * margin; if (width < option->fontMetrics.averageCharWidth() * 6) { // text is too wide for the current LOD, don't paint it return; } const int height = rect().height(); painter->drawText(margin + rect().x(), rect().y(), width, height, Qt::AlignVCenter | Qt::AlignLeft | Qt::TextSingleLine, option->fontMetrics.elidedText(m_function, Qt::ElideRight, width)); } void FrameGraphicsItem::hoverEnterEvent(QGraphicsSceneHoverEvent* event) { QGraphicsRectItem::hoverEnterEvent(event); m_isHovered = true; } QString FrameGraphicsItem::description() const { // we build the tooltip text on demand, which is much faster than doing that // for potentially thousands of items when we load the data QString tooltip; KFormat format; qint64 totalCost = 0; { auto item = this; while (item->parentItem()) { item = static_cast(item->parentItem()); } totalCost = item->cost(); } const auto fraction = QString::number(double(m_cost) * 100. / totalCost, 'g', 3); const auto function = QString(QLatin1String("") + m_function.toHtmlEscaped() + QLatin1String("")); if (!parentItem()) { return function; } switch (m_costType) { case Allocations: tooltip = i18nc("%1: number of allocations, %2: relative number, %3: function label", "%1 (%2%) allocations in %3 and below.", m_cost, fraction, function); break; case Temporary: tooltip = i18nc("%1: number of temporary allocations, %2: relative number, " "%3 function label", "%1 (%2%) temporary allocations in %3 and below.", m_cost, fraction, function); break; case Peak: tooltip = i18nc("%1: peak consumption in bytes, %2: relative number, %3: " "function label", "%1 (%2%) peak consumption in %3 and below.", format.formatByteSize(m_cost), fraction, function); break; case Leaked: tooltip = i18nc("%1: leaked bytes, %2: relative number, %3: function label", "%1 (%2%) leaked in %3 and below.", format.formatByteSize(m_cost), fraction, function); break; case Allocated: tooltip = i18nc("%1: allocated bytes, %2: relative number, %3: function label", "%1 (%2%) allocated in %3 and below.", format.formatByteSize(m_cost), fraction, function); break; } return tooltip; } void FrameGraphicsItem::hoverLeaveEvent(QGraphicsSceneHoverEvent* event) { QGraphicsRectItem::hoverLeaveEvent(event); m_isHovered = false; } namespace { /** * Generate a brush from the "mem" color space used in upstream FlameGraph.pl */ QBrush brush() { // intern the brushes, to reuse them across items which can be thousands // otherwise we'd end up with dozens of allocations and higher memory // consumption static const QVector brushes = []() -> QVector { QVector brushes; std::generate_n(std::back_inserter(brushes), 100, []() { return QColor(0, 190 + 50 * qreal(rand()) / RAND_MAX, 210 * qreal(rand()) / RAND_MAX, 125); }); return brushes; }(); return brushes.at(rand() % brushes.size()); } /** * Layout the flame graph and hide tiny items. */ void layoutItems(FrameGraphicsItem* parent) { const auto& parentRect = parent->rect(); const auto pos = parentRect.topLeft(); const qreal maxWidth = parentRect.width(); const qreal h = parentRect.height(); const qreal y_margin = 2.; const qreal y = pos.y() - h - y_margin; qreal x = pos.x(); foreach (auto child, parent->childItems()) { auto frameChild = static_cast(child); const qreal w = maxWidth * double(frameChild->cost()) / parent->cost(); frameChild->setVisible(w > 1); if (frameChild->isVisible()) { frameChild->setRect(QRectF(x, y, w, h)); layoutItems(frameChild); x += w; } } } FrameGraphicsItem* findItemByFunction(const QList& items, const QString& function) { foreach (auto item_, items) { auto item = static_cast(item_); if (item->function() == function) { return item; } } return nullptr; } /** * Convert the top-down graph into a tree of FrameGraphicsItem. */ void toGraphicsItems(const QVector& data, FrameGraphicsItem* parent, int64_t AllocationData::*member, const double costThreshold, bool collapseRecursion) { foreach (const auto& row, data) { - if (collapseRecursion && row.location->function == parent->function()) { + if (collapseRecursion && row.location->function != unresolvedFunctionName() + && row.location->function == parent->function()) + { continue; } auto item = findItemByFunction(parent->childItems(), row.location->function); if (!item) { item = new FrameGraphicsItem(row.cost.*member, row.location->function, parent); item->setPen(parent->pen()); item->setBrush(brush()); } else { item->setCost(item->cost() + row.cost.*member); } if (item->cost() > costThreshold) { toGraphicsItems(row.children, item, member, costThreshold, collapseRecursion); } } } int64_t AllocationData::*memberForType(CostType type) { switch (type) { case Allocations: return &AllocationData::allocations; case Temporary: return &AllocationData::temporary; case Peak: return &AllocationData::peak; case Leaked: return &AllocationData::leaked; case Allocated: return &AllocationData::allocated; } Q_UNREACHABLE(); } FrameGraphicsItem* parseData(const QVector& topDownData, CostType type, double costThreshold, bool collapseRecursion) { auto member = memberForType(type); double totalCost = 0; foreach (const auto& frame, topDownData) { totalCost += frame.cost.*member; } KColorScheme scheme(QPalette::Active); const QPen pen(scheme.foreground().color()); KFormat format; QString label; switch (type) { case Allocations: label = i18n("%1 allocations in total", totalCost); break; case Temporary: label = i18n("%1 temporary allocations in total", totalCost); break; case Peak: label = i18n("%1 peak consumption in total", format.formatByteSize(totalCost)); break; case Leaked: label = i18n("%1 leaked in total", format.formatByteSize(totalCost)); break; case Allocated: label = i18n("%1 allocated in total", format.formatByteSize(totalCost)); break; } auto rootItem = new FrameGraphicsItem(totalCost, type, label); rootItem->setBrush(scheme.background()); rootItem->setPen(pen); toGraphicsItems(topDownData, rootItem, member, totalCost * costThreshold / 100., collapseRecursion); return rootItem; } } FlameGraph::FlameGraph(QWidget* parent, Qt::WindowFlags flags) : QWidget(parent, flags) , m_costSource(new QComboBox(this)) , m_scene(new QGraphicsScene(this)) , m_view(new QGraphicsView(this)) , m_displayLabel(new QLabel) { qRegisterMetaType(); m_costSource->addItem(i18n("Allocations"), QVariant::fromValue(Allocations)); m_costSource->setItemData(0, i18n("Show a flame graph over the number of allocations triggered by " "functions in your code."), Qt::ToolTipRole); m_costSource->addItem(i18n("Temporary Allocations"), QVariant::fromValue(Temporary)); m_costSource->setItemData(1, i18n("Show a flame graph over the number of temporary allocations " "triggered by functions in your code. " "Allocations are marked as temporary when they are immediately " "followed by their deallocation."), Qt::ToolTipRole); m_costSource->addItem(i18n("Peak Consumption"), QVariant::fromValue(Peak)); m_costSource->setItemData(2, i18n("Show a flame graph over the peak heap " "memory consumption of your application."), Qt::ToolTipRole); m_costSource->addItem(i18n("Leaked"), QVariant::fromValue(Leaked)); m_costSource->setItemData(3, i18n("Show a flame graph over the leaked heap memory of your application. " "Memory is considered to be leaked when it never got deallocated. "), Qt::ToolTipRole); m_costSource->addItem(i18n("Allocated"), QVariant::fromValue(Allocated)); m_costSource->setItemData(4, i18n("Show a flame graph over the total memory allocated by functions in " "your code. " "This aggregates all memory allocations and ignores deallocations."), Qt::ToolTipRole); connect(m_costSource, static_cast(&QComboBox::currentIndexChanged), this, &FlameGraph::showData); m_costSource->setToolTip(i18n("Select the data source that should be visualized in the flame graph.")); m_scene->setItemIndexMethod(QGraphicsScene::NoIndex); m_view->setScene(m_scene); m_view->viewport()->installEventFilter(this); m_view->viewport()->setMouseTracking(true); m_view->setFont(QFont(QStringLiteral("monospace"))); auto bottomUpCheckbox = new QCheckBox(i18n("Bottom-Down View"), this); bottomUpCheckbox->setToolTip(i18n("Enable the bottom-down flame graph view. When this is unchecked, " "the top-down view is enabled by default.")); connect(bottomUpCheckbox, &QCheckBox::toggled, this, [this, bottomUpCheckbox] { m_showBottomUpData = bottomUpCheckbox->isChecked(); showData(); }); auto collapseRecursionCheckbox = new QCheckBox(i18n("Collapse Recursion"), this); collapseRecursionCheckbox->setChecked(m_collapseRecursion); collapseRecursionCheckbox->setToolTip(i18n("Collapse stack frames for functions calling themselves. " "When this is unchecked, recursive frames will be visualized " "separately.")); connect(collapseRecursionCheckbox, &QCheckBox::toggled, this, [this, collapseRecursionCheckbox] { m_collapseRecursion = collapseRecursionCheckbox->isChecked(); showData(); }); auto costThreshold = new QDoubleSpinBox(this); costThreshold->setDecimals(2); costThreshold->setMinimum(0); costThreshold->setMaximum(99.90); costThreshold->setPrefix(i18n("Cost Threshold: ")); costThreshold->setSuffix(QStringLiteral("%")); costThreshold->setValue(m_costThreshold); costThreshold->setSingleStep(0.01); costThreshold->setToolTip(i18n("The cost threshold defines a fractional cut-off value. " "Items with a relative cost below this value will not be shown in " "the flame graph. This is done as an optimization to quickly generate " "graphs for large data sets with low memory overhead. If you need more " "details, decrease the threshold value, or set it to zero.")); connect(costThreshold, static_cast(&QDoubleSpinBox::valueChanged), this, [this](double threshold) { m_costThreshold = threshold; showData(); }); m_displayLabel->setWordWrap(true); m_displayLabel->setTextInteractionFlags(m_displayLabel->textInteractionFlags() | Qt::TextSelectableByMouse); auto controls = new QWidget(this); controls->setLayout(new QHBoxLayout); controls->layout()->addWidget(m_costSource); controls->layout()->addWidget(bottomUpCheckbox); controls->layout()->addWidget(collapseRecursionCheckbox); controls->layout()->addWidget(costThreshold); setLayout(new QVBoxLayout); layout()->addWidget(controls); layout()->addWidget(m_view); layout()->addWidget(m_displayLabel); addAction(KStandardAction::back(this, SLOT(navigateBack()), this)); addAction(KStandardAction::forward(this, SLOT(navigateForward()), this)); setContextMenuPolicy(Qt::ActionsContextMenu); } FlameGraph::~FlameGraph() = default; bool FlameGraph::eventFilter(QObject* object, QEvent* event) { bool ret = QObject::eventFilter(object, event); if (event->type() == QEvent::MouseButtonRelease) { QMouseEvent* mouseEvent = static_cast(event); if (mouseEvent->button() == Qt::LeftButton) { auto item = static_cast(m_view->itemAt(mouseEvent->pos())); if (item && item != m_selectionHistory.at(m_selectedItem)) { selectItem(item); if (m_selectedItem != m_selectionHistory.size() - 1) { m_selectionHistory.remove(m_selectedItem + 1, m_selectionHistory.size() - m_selectedItem - 1); } m_selectedItem = m_selectionHistory.size(); m_selectionHistory.push_back(item); } } } else if (event->type() == QEvent::MouseMove) { QMouseEvent* mouseEvent = static_cast(event); auto item = static_cast(m_view->itemAt(mouseEvent->pos())); setTooltipItem(item); } else if (event->type() == QEvent::Leave) { setTooltipItem(nullptr); } else if (event->type() == QEvent::Resize || event->type() == QEvent::Show) { if (!m_rootItem) { if (!m_buildingScene) { showData(); } } else { selectItem(m_selectionHistory.at(m_selectedItem)); } updateTooltip(); } else if (event->type() == QEvent::Hide) { setData(nullptr); } return ret; } void FlameGraph::setTopDownData(const TreeData& topDownData) { m_topDownData = topDownData; if (isVisible()) { showData(); } } void FlameGraph::setBottomUpData(const TreeData& bottomUpData) { m_bottomUpData = bottomUpData; } void FlameGraph::clearData() { m_topDownData = {}; m_bottomUpData = {}; setData(nullptr); } void FlameGraph::showData() { setData(nullptr); m_buildingScene = true; using namespace ThreadWeaver; auto data = m_showBottomUpData ? m_bottomUpData : m_topDownData; bool collapseRecursion = m_collapseRecursion; auto source = m_costSource->currentData().value(); auto threshold = m_costThreshold; stream() << make_job([data, source, threshold, collapseRecursion, this]() { auto parsedData = parseData(data, source, threshold, collapseRecursion); QMetaObject::invokeMethod(this, "setData", Qt::QueuedConnection, Q_ARG(FrameGraphicsItem*, parsedData)); }); } void FlameGraph::setTooltipItem(const FrameGraphicsItem* item) { if (!item && m_selectedItem != -1 && m_selectionHistory.at(m_selectedItem)) { item = m_selectionHistory.at(m_selectedItem); m_view->setCursor(Qt::ArrowCursor); } else { m_view->setCursor(Qt::PointingHandCursor); } m_tooltipItem = item; updateTooltip(); } void FlameGraph::updateTooltip() { const auto text = m_tooltipItem ? m_tooltipItem->description() : QString(); m_displayLabel->setToolTip(text); const auto metrics = m_displayLabel->fontMetrics(); // FIXME: the HTML text has tons of stuff that is not printed, // which lets the text get cut-off too soon... m_displayLabel->setText(metrics.elidedText(text, Qt::ElideRight, m_displayLabel->width())); } void FlameGraph::setData(FrameGraphicsItem* rootItem) { m_scene->clear(); m_buildingScene = false; m_rootItem = rootItem; m_selectionHistory.clear(); m_selectionHistory.push_back(rootItem); m_selectedItem = 0; if (!rootItem) { auto text = m_scene->addText(i18n("generating flame graph...")); m_view->centerOn(text); m_view->setCursor(Qt::BusyCursor); return; } m_view->setCursor(Qt::ArrowCursor); // layouting needs a root item with a given height, the rest will be // overwritten later rootItem->setRect(0, 0, 800, m_view->fontMetrics().height() + 4); m_scene->addItem(rootItem); if (isVisible()) { selectItem(m_rootItem); } } void FlameGraph::selectItem(FrameGraphicsItem* item) { if (!item) { return; } // scale item and its parents to the maximum available width // also hide all siblings of the parent items const auto rootWidth = m_view->viewport()->width() - 40; auto parent = item; while (parent) { auto rect = parent->rect(); rect.setLeft(0); rect.setWidth(rootWidth); parent->setRect(rect); if (parent->parentItem()) { foreach (auto sibling, parent->parentItem()->childItems()) { sibling->setVisible(sibling == parent); } } parent = static_cast(parent->parentItem()); } // then layout all items below the selected on layoutItems(item); // and make sure it's visible m_view->centerOn(item); setTooltipItem(item); } void FlameGraph::navigateBack() { if (m_selectedItem > 0) { --m_selectedItem; } selectItem(m_selectionHistory.at(m_selectedItem)); } void FlameGraph::navigateForward() { if ((m_selectedItem + 1) < m_selectionHistory.size()) { ++m_selectedItem; } selectItem(m_selectionHistory.at(m_selectedItem)); } diff --git a/src/analyze/gui/locationdata.h b/src/analyze/gui/locationdata.h index d9d3ac5..235fc37 100644 --- a/src/analyze/gui/locationdata.h +++ b/src/analyze/gui/locationdata.h @@ -1,81 +1,88 @@ /* * Copyright 2016-2017 Milian Wolff * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this program; if not, write to the * Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifndef LOCATIONDATA_H #define LOCATIONDATA_H #include #include #include +#include + struct LocationData { using Ptr = std::shared_ptr; QString function; QString file; QString module; int line; bool operator==(const LocationData& rhs) const { return function == rhs.function && file == rhs.file && module == rhs.module && line == rhs.line; } bool operator<(const LocationData& rhs) const { int i = function.compare(rhs.function); if (!i) { i = file.compare(rhs.file); } if (!i) { i = line < rhs.line ? -1 : (line > rhs.line); } if (!i) { i = module.compare(rhs.module); } return i < 0; } }; Q_DECLARE_TYPEINFO(LocationData, Q_MOVABLE_TYPE); Q_DECLARE_METATYPE(LocationData::Ptr) +inline QString unresolvedFunctionName() +{ + return i18n(""); +} + inline bool operator<(const LocationData::Ptr& lhs, const LocationData& rhs) { return *lhs < rhs; } inline uint qHash(const LocationData& location, uint seed_ = 0) { size_t seed = seed_; boost::hash_combine(seed, qHash(location.function)); boost::hash_combine(seed, qHash(location.file)); boost::hash_combine(seed, qHash(location.module)); boost::hash_combine(seed, location.line); return seed; } inline uint qHash(const LocationData::Ptr& location, uint seed = 0) { return location ? qHash(*location, seed) : seed; } #endif // LOCATIONDATA_H diff --git a/src/analyze/gui/parser.cpp b/src/analyze/gui/parser.cpp index f582490..a8f1f5d 100644 --- a/src/analyze/gui/parser.cpp +++ b/src/analyze/gui/parser.cpp @@ -1,634 +1,601 @@ /* * Copyright 2015-2017 Milian Wolff * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this program; if not, write to the * Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include "parser.h" #include #include #include #include "analyze/accumulatedtracedata.h" #include #include #include using namespace std; namespace { // TODO: use QString directly struct StringCache { QString func(const InstructionPointer& ip) const { if (ip.functionIndex) { // TODO: support removal of template arguments return stringify(ip.functionIndex); } else { - return i18n(""); + return unresolvedFunctionName(); } } QString file(const InstructionPointer& ip) const { if (ip.fileIndex) { return stringify(ip.fileIndex); } else { return {}; } } QString module(const InstructionPointer& ip) const { return stringify(ip.moduleIndex); } QString stringify(const StringIndex index) const { if (!index || index.index > m_strings.size()) { return {}; } else { return m_strings.at(index.index - 1); } } LocationData::Ptr location(const IpIndex& index, const InstructionPointer& ip) const { // first try a fast index-based lookup auto& location = m_locationsMap[index]; if (!location) { // slow-path, look for interned location // note that we can get the same locatoin for different IPs LocationData data = {func(ip), file(ip), module(ip), ip.line}; auto it = lower_bound(m_locations.begin(), m_locations.end(), data); if (it != m_locations.end() && **it == data) { // we got the location already from a different ip, cache it location = *it; } else { // completely new location, cache it in both containers auto interned = make_shared(data); m_locations.insert(it, interned); location = interned; } } return location; } void update(const vector& strings) { transform(strings.begin() + m_strings.size(), strings.end(), back_inserter(m_strings), [](const string& str) { return QString::fromStdString(str); }); } vector m_strings; mutable vector m_locations; mutable QHash m_locationsMap; bool diffMode = false; }; struct ChartMergeData { IpIndex ip; qint64 consumed; qint64 allocations; qint64 allocated; qint64 temporary; bool operator<(const IpIndex rhs) const { return ip < rhs; } }; const uint64_t MAX_CHART_DATAPOINTS = 500; // TODO: make this configurable via the GUI struct ParserData final : public AccumulatedTraceData { ParserData() { } void updateStringCache() { stringCache.update(strings); } void prepareBuildCharts() { if (stringCache.diffMode) { return; } consumedChartData.rows.reserve(MAX_CHART_DATAPOINTS); allocatedChartData.rows.reserve(MAX_CHART_DATAPOINTS); allocationsChartData.rows.reserve(MAX_CHART_DATAPOINTS); temporaryChartData.rows.reserve(MAX_CHART_DATAPOINTS); // start off with null data at the origin consumedChartData.rows.push_back({}); allocatedChartData.rows.push_back({}); allocationsChartData.rows.push_back({}); temporaryChartData.rows.push_back({}); // index 0 indicates the total row consumedChartData.labels[0] = i18n("total"); allocatedChartData.labels[0] = i18n("total"); allocationsChartData.labels[0] = i18n("total"); temporaryChartData.labels[0] = i18n("total"); buildCharts = true; maxConsumedSinceLastTimeStamp = 0; vector merged; merged.reserve(instructionPointers.size()); // merge the allocation cost by instruction pointer // TODO: aggregate by function instead? // TODO: traverse the merged call stack up until the first fork for (const auto& alloc : allocations) { const auto ip = findTrace(alloc.traceIndex).ipIndex; auto it = lower_bound(merged.begin(), merged.end(), ip); if (it == merged.end() || it->ip != ip) { it = merged.insert(it, {ip, 0, 0, 0, 0}); } it->consumed += alloc.peak; // we want to track the top peaks in the chart it->allocated += alloc.allocated; it->allocations += alloc.allocations; it->temporary += alloc.temporary; } // find the top hot spots for the individual data members and remember their // IP and store the label auto findTopChartEntries = [&](qint64 ChartMergeData::*member, int LabelIds::*label, ChartData* data) { sort(merged.begin(), merged.end(), [=](const ChartMergeData& left, const ChartMergeData& right) { return left.*member > right.*member; }); for (size_t i = 0; i < min(size_t(ChartRows::MAX_NUM_COST - 1), merged.size()); ++i) { const auto& alloc = merged[i]; if (!(alloc.*member)) { break; } const auto ip = alloc.ip; (labelIds[ip].*label) = i + 1; const auto function = stringCache.func(findIp(ip)); data->labels[i + 1] = function; } }; findTopChartEntries(&ChartMergeData::consumed, &LabelIds::consumed, &consumedChartData); findTopChartEntries(&ChartMergeData::allocated, &LabelIds::allocated, &allocatedChartData); findTopChartEntries(&ChartMergeData::allocations, &LabelIds::allocations, &allocationsChartData); findTopChartEntries(&ChartMergeData::temporary, &LabelIds::temporary, &temporaryChartData); } void handleTimeStamp(int64_t /*oldStamp*/, int64_t newStamp) { if (!buildCharts || stringCache.diffMode) { return; } maxConsumedSinceLastTimeStamp = max(maxConsumedSinceLastTimeStamp, totalCost.leaked); const int64_t diffBetweenTimeStamps = totalTime / MAX_CHART_DATAPOINTS; if (newStamp != totalTime && newStamp - lastTimeStamp < diffBetweenTimeStamps) { return; } const auto nowConsumed = maxConsumedSinceLastTimeStamp; maxConsumedSinceLastTimeStamp = 0; lastTimeStamp = newStamp; // create the rows auto createRow = [](int64_t timeStamp, int64_t totalCost) { ChartRows row; row.timeStamp = timeStamp; row.cost[0] = totalCost; return row; }; auto consumed = createRow(newStamp, nowConsumed); auto allocated = createRow(newStamp, totalCost.allocated); auto allocs = createRow(newStamp, totalCost.allocations); auto temporary = createRow(newStamp, totalCost.temporary); // if the cost is non-zero and the ip corresponds to a hotspot function // selected in the labels, // we add the cost to the rows column auto addDataToRow = [](int64_t cost, int labelId, ChartRows* rows) { if (!cost || labelId == -1) { return; } rows->cost[labelId] += cost; }; for (const auto& alloc : allocations) { const auto ip = findTrace(alloc.traceIndex).ipIndex; auto it = labelIds.constFind(ip); if (it == labelIds.constEnd()) { continue; } const auto& labelIds = *it; addDataToRow(alloc.leaked, labelIds.consumed, &consumed); addDataToRow(alloc.allocated, labelIds.allocated, &allocated); addDataToRow(alloc.allocations, labelIds.allocations, &allocs); addDataToRow(alloc.temporary, labelIds.temporary, &temporary); } // add the rows for this time stamp consumedChartData.rows << consumed; allocatedChartData.rows << allocated; allocationsChartData.rows << allocs; temporaryChartData.rows << temporary; } void handleAllocation(const AllocationInfo& info, const AllocationIndex index) { maxConsumedSinceLastTimeStamp = max(maxConsumedSinceLastTimeStamp, totalCost.leaked); if (index.index == allocationInfoCounter.size()) { allocationInfoCounter.push_back({info, 1}); } else { ++allocationInfoCounter[index.index].allocations; } } void handleDebuggee(const char* command) { debuggee = command; } string debuggee; struct CountedAllocationInfo { AllocationInfo info; int64_t allocations; bool operator<(const CountedAllocationInfo& rhs) const { return tie(info.size, allocations) < tie(rhs.info.size, rhs.allocations); } }; vector allocationInfoCounter; ChartData consumedChartData; ChartData allocationsChartData; ChartData allocatedChartData; ChartData temporaryChartData; // here we store the indices into ChartRows::cost for those IpIndices that // are within the top hotspots. This way, we can do one hash lookup in the // handleTimeStamp function instead of three when we'd store this data // in a per-ChartData hash. struct LabelIds { int consumed = -1; int allocations = -1; int allocated = -1; int temporary = -1; }; QHash labelIds; int64_t maxConsumedSinceLastTimeStamp = 0; int64_t lastTimeStamp = 0; StringCache stringCache; bool buildCharts = false; }; void setParents(QVector& children, const RowData* parent) { for (auto& row : children) { row.parent = parent; setParents(row.children, &row); } } TreeData mergeAllocations(const ParserData& data) { TreeData topRows; // merge allocations, leave parent pointers invalid (their location may change) for (const auto& allocation : data.allocations) { auto traceIndex = allocation.traceIndex; auto rows = &topRows; while (traceIndex) { const auto& trace = data.findTrace(traceIndex); const auto& ip = data.findIp(trace.ipIndex); auto location = data.stringCache.location(trace.ipIndex, ip); auto it = lower_bound(rows->begin(), rows->end(), location); if (it != rows->end() && it->location == location) { it->cost += allocation; } else { it = rows->insert(it, {allocation, location, nullptr, {}}); } if (data.isStopIndex(ip.functionIndex)) { break; } traceIndex = trace.parentIndex; rows = &it->children; } } // now set the parents, the data is constant from here on setParents(topRows, nullptr); return topRows; } RowData* findByLocation(const RowData& row, QVector* data) { for (int i = 0; i < data->size(); ++i) { if (data->at(i).location == row.location) { return data->data() + i; } } return nullptr; } AllocationData buildTopDown(const TreeData& bottomUpData, TreeData* topDownData) { AllocationData totalCost; for (const auto& row : bottomUpData) { // recurse and find the cost attributed to children const auto childCost = buildTopDown(row.children, topDownData); if (childCost != row.cost) { // this row is (partially) a leaf const auto cost = row.cost - childCost; // bubble up the parent chain to build a top-down tree auto node = &row; auto stack = topDownData; while (node) { auto data = findByLocation(*node, stack); if (!data) { // create an empty top-down item for this bottom-up node *stack << RowData{{}, node->location, nullptr, {}}; data = &stack->back(); } // always use the leaf node's cost and propagate that one up the chain // otherwise we'd count the cost of some nodes multiple times data->cost += cost; stack = &data->children; node = node->parent; } } totalCost += row.cost; } return totalCost; } QVector toTopDownData(const QVector& bottomUpData) { QVector topRows; buildTopDown(bottomUpData, &topRows); // now set the parents, the data is constant from here on setParents(topRows, nullptr); return topRows; } -void buildCallerCallee2(const TreeData& bottomUpData, CallerCalleeRows* callerCalleeData) -{ - foreach (const auto& row, bottomUpData) { - if (row.children.isEmpty()) { - // leaf node found, bubble up the parent chain to add cost for all frames - // to the caller/callee data. this is done top-down since we must not count - // locations more than once in the caller-callee data - QSet recursionGuard; - - auto node = &row; - while (node) { - const auto& location = node->location; - if (!recursionGuard.contains(location)) { // aggregate caller-callee data - auto it = lower_bound(callerCalleeData->begin(), callerCalleeData->end(), location, - [](const CallerCalleeData& lhs, const LocationData::Ptr& rhs) { return lhs.location < rhs; }); - if (it == callerCalleeData->end() || it->location != location) { - it = callerCalleeData->insert(it, {{}, {}, location}); - } - it->inclusiveCost += row.cost; - if (!node->parent) { - it->selfCost += row.cost; - } - recursionGuard.insert(location); - } - node = node->parent; - } - } else { - // recurse to find a leaf - buildCallerCallee2(row.children, callerCalleeData); - } - } -} - AllocationData buildCallerCallee(const TreeData& bottomUpData, CallerCalleeRows* callerCalleeData) { AllocationData totalCost; for (const auto& row : bottomUpData) { // recurse to find a leaf const auto childCost = buildCallerCallee(row.children, callerCalleeData); if (childCost != row.cost) { // this row is (partially) a leaf const auto cost = row.cost - childCost; // leaf node found, bubble up the parent chain to add cost for all frames // to the caller/callee data. this is done top-down since we must not count // symbols more than once in the caller-callee data QSet recursionGuard; auto node = &row; while (node) { const auto& location = node->location; if (!recursionGuard.contains(location)) { // aggregate caller-callee data auto it = lower_bound(callerCalleeData->begin(), callerCalleeData->end(), location, [](const CallerCalleeData& lhs, const LocationData::Ptr& rhs) { return lhs.location < rhs; }); if (it == callerCalleeData->end() || it->location != location) { it = callerCalleeData->insert(it, {{}, {}, location}); } it->inclusiveCost += cost; if (!node->parent) { it->selfCost += cost; } recursionGuard.insert(location); } node = node->parent; } } totalCost += row.cost; } return totalCost; } CallerCalleeRows toCallerCalleeData(const QVector& bottomUpData, bool diffMode) { CallerCalleeRows callerCalleeRows; buildCallerCallee(bottomUpData, &callerCalleeRows); if (diffMode) { // remove rows without cost callerCalleeRows.erase(remove_if(callerCalleeRows.begin(), callerCalleeRows.end(), [](const CallerCalleeData& data) -> bool { return data.inclusiveCost == AllocationData() && data.selfCost == AllocationData(); }), callerCalleeRows.end()); } return callerCalleeRows; } struct MergedHistogramColumnData { LocationData::Ptr location; int64_t allocations; bool operator<(const LocationData::Ptr& rhs) const { return location < rhs; } }; HistogramData buildSizeHistogram(ParserData& data) { HistogramData ret; if (data.allocationInfoCounter.empty()) { return ret; } sort(data.allocationInfoCounter.begin(), data.allocationInfoCounter.end()); const auto totalLabel = i18n("total"); HistogramRow row; const pair buckets[] = {{8, i18n("0B to 8B")}, {16, i18n("9B to 16B")}, {32, i18n("17B to 32B")}, {64, i18n("33B to 64B")}, {128, i18n("65B to 128B")}, {256, i18n("129B to 256B")}, {512, i18n("257B to 512B")}, {1024, i18n("512B to 1KB")}, {numeric_limits::max(), i18n("more than 1KB")}}; uint bucketIndex = 0; row.size = buckets[bucketIndex].first; row.sizeLabel = buckets[bucketIndex].second; vector columnData; columnData.reserve(128); auto insertColumns = [&]() { sort(columnData.begin(), columnData.end(), [](const MergedHistogramColumnData& lhs, const MergedHistogramColumnData& rhs) { return lhs.allocations > rhs.allocations; }); // -1 to account for total row for (size_t i = 0; i < min(columnData.size(), size_t(HistogramRow::NUM_COLUMNS - 1)); ++i) { const auto& column = columnData[i]; row.columns[i + 1] = {column.allocations, column.location}; } }; for (const auto& info : data.allocationInfoCounter) { if (info.info.size > row.size) { insertColumns(); columnData.clear(); ret << row; ++bucketIndex; row.size = buckets[bucketIndex].first; row.sizeLabel = buckets[bucketIndex].second; row.columns[0] = {info.allocations, {}}; } else { row.columns[0].allocations += info.allocations; } const auto ipIndex = data.findTrace(info.info.traceIndex).ipIndex; const auto ip = data.findIp(ipIndex); const auto location = data.stringCache.location(ipIndex, ip); auto it = lower_bound(columnData.begin(), columnData.end(), location); if (it == columnData.end() || it->location != location) { columnData.insert(it, {location, info.allocations}); } else { it->allocations += info.allocations; } } insertColumns(); ret << row; return ret; } } Parser::Parser(QObject* parent) : QObject(parent) { qRegisterMetaType(); } Parser::~Parser() = default; void Parser::parse(const QString& path, const QString& diffBase) { using namespace ThreadWeaver; stream() << make_job([this, path, diffBase]() { const auto stdPath = path.toStdString(); auto data = make_shared(); emit progressMessageAvailable(i18n("parsing data...")); if (!diffBase.isEmpty()) { ParserData diffData; auto readBase = async(launch::async, [&diffData, diffBase]() { return diffData.read(diffBase.toStdString()); }); if (!data->read(stdPath)) { emit failedToOpen(path); return; } if (!readBase.get()) { emit failedToOpen(diffBase); return; } data->diff(diffData); data->stringCache.diffMode = true; } else if (!data->read(stdPath)) { emit failedToOpen(path); return; } data->updateStringCache(); emit summaryAvailable({QString::fromStdString(data->debuggee), data->totalCost, data->totalTime, data->peakTime, data->peakRSS * data->systemInfo.pageSize, data->systemInfo.pages * data->systemInfo.pageSize, data->fromAttached}); emit progressMessageAvailable(i18n("merging allocations...")); // merge allocations before modifying the data again const auto mergedAllocations = mergeAllocations(*data); emit bottomUpDataAvailable(mergedAllocations); // also calculate the size histogram emit progressMessageAvailable(i18n("building size histogram...")); const auto sizeHistogram = buildSizeHistogram(*data); emit sizeHistogramDataAvailable(sizeHistogram); // now data can be modified again for the chart data evaluation const auto diffMode = data->stringCache.diffMode; emit progressMessageAvailable(i18n("building charts...")); auto parallel = new Collection; *parallel << make_job([this, mergedAllocations]() { const auto topDownData = toTopDownData(mergedAllocations); emit topDownDataAvailable(topDownData); }) << make_job([this, mergedAllocations, diffMode]() { const auto callerCalleeData = toCallerCalleeData(mergedAllocations, diffMode); emit callerCalleeDataAvailable(callerCalleeData); }); if (!data->stringCache.diffMode) { // only build charts when we are not diffing *parallel << make_job([this, data, stdPath]() { // this mutates data, and thus anything running in parallel must // not access data data->prepareBuildCharts(); data->read(stdPath); emit consumedChartDataAvailable(data->consumedChartData); emit allocationsChartDataAvailable(data->allocationsChartData); emit allocatedChartDataAvailable(data->allocatedChartData); emit temporaryChartDataAvailable(data->temporaryChartData); }); } auto sequential = new Sequence; *sequential << parallel << make_job([this]() { emit finished(); }); stream() << sequential; }); } diff --git a/src/interpret/heaptrack_interpret.cpp b/src/interpret/heaptrack_interpret.cpp index 33e4c33..98477ce 100644 --- a/src/interpret/heaptrack_interpret.cpp +++ b/src/interpret/heaptrack_interpret.cpp @@ -1,430 +1,430 @@ /* * Copyright 2014-2017 Milian Wolff * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this program; if not, write to the * Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ /** * @file heaptrack_interpret.cpp * * @brief Interpret raw heaptrack data and add Dwarf based debug information. */ #include #include #include #include #include #include #include #include #include #include #include "libbacktrace/backtrace.h" #include "libbacktrace/internal.h" #include "util/linereader.h" #include "util/pointermap.h" using namespace std; namespace { string demangle(const char* function) { if (!function) { return {}; } else if (function[0] != '_' || function[1] != 'Z') { return {function}; } string ret; int status = 0; char* demangled = abi::__cxa_demangle(function, 0, 0, &status); if (demangled) { ret = demangled; free(demangled); } return ret; } struct AddressInformation { string function; string file; int line = 0; }; struct Module { Module(uintptr_t addressStart, uintptr_t addressEnd, backtrace_state* backtraceState, size_t moduleIndex) : addressStart(addressStart) , addressEnd(addressEnd) , moduleIndex(moduleIndex) , backtraceState(backtraceState) { } AddressInformation resolveAddress(uintptr_t address) const { AddressInformation info; if (!backtraceState) { return info; } backtrace_pcinfo(backtraceState, address, [](void* data, uintptr_t /*addr*/, const char* file, int line, const char* function) -> int { auto info = reinterpret_cast(data); info->function = demangle(function); info->file = file ? file : ""; info->line = line; return 0; }, [](void* /*data*/, const char* /*msg*/, int /*errnum*/) {}, &info); if (info.function.empty()) { backtrace_syminfo( backtraceState, address, [](void* data, uintptr_t /*pc*/, const char* symname, uintptr_t /*symval*/, uintptr_t /*symsize*/) { if (symname) { reinterpret_cast(data)->function = demangle(symname); } }, [](void* /*data*/, const char* msg, int errnum) { cerr << "Module backtrace error (code " << errnum << "): " << msg << endl; }, &info); } return info; } bool operator<(const Module& module) const { return tie(addressStart, addressEnd, moduleIndex) < tie(module.addressStart, module.addressEnd, module.moduleIndex); } bool operator!=(const Module& module) const { return tie(addressStart, addressEnd, moduleIndex) != tie(module.addressStart, module.addressEnd, module.moduleIndex); } uintptr_t addressStart; uintptr_t addressEnd; size_t moduleIndex; backtrace_state* backtraceState; }; struct ResolvedIP { size_t moduleIndex = 0; size_t fileIndex = 0; size_t functionIndex = 0; int line = 0; }; struct AccumulatedTraceData { AccumulatedTraceData() { m_modules.reserve(256); m_backtraceStates.reserve(64); m_internedData.reserve(4096); m_encounteredIps.reserve(32768); } ~AccumulatedTraceData() { fprintf(stdout, "# strings: %zu\n# ips: %zu\n", m_internedData.size(), m_encounteredIps.size()); } ResolvedIP resolve(const uintptr_t ip) { if (m_modulesDirty) { // sort by addresses, required for binary search below sort(m_modules.begin(), m_modules.end()); #ifndef NDEBUG for (size_t i = 0; i < m_modules.size(); ++i) { const auto& m1 = m_modules[i]; for (size_t j = i + 1; j < m_modules.size(); ++j) { if (i == j) { continue; } const auto& m2 = m_modules[j]; if ((m1.addressStart <= m2.addressStart && m1.addressEnd > m2.addressStart) || (m1.addressStart < m2.addressEnd && m1.addressEnd >= m2.addressEnd)) { cerr << "OVERLAPPING MODULES: " << hex << m1.moduleIndex << " (" << m1.addressStart << " to " << m1.addressEnd << ") and " << m1.moduleIndex << " (" << m2.addressStart << " to " << m2.addressEnd << ")\n" << dec; } else if (m2.addressStart >= m1.addressEnd) { break; } } } #endif m_modulesDirty = false; } ResolvedIP data; // find module for this instruction pointer auto module = lower_bound(m_modules.begin(), m_modules.end(), ip, [](const Module& module, const uintptr_t ip) -> bool { return module.addressEnd < ip; }); if (module != m_modules.end() && module->addressStart <= ip && module->addressEnd >= ip) { data.moduleIndex = module->moduleIndex; const auto info = module->resolveAddress(ip); data.fileIndex = intern(info.file); data.functionIndex = intern(info.function); data.line = info.line; } return data; } size_t intern(const string& str, const char** internedString = nullptr) { if (str.empty()) { return 0; } auto it = m_internedData.find(str); if (it != m_internedData.end()) { if (internedString) { *internedString = it->first.c_str(); } return it->second; } const size_t id = m_internedData.size() + 1; it = m_internedData.insert(it, make_pair(str, id)); if (internedString) { *internedString = it->first.c_str(); } fprintf(stdout, "s %s\n", str.c_str()); return id; } void addModule(backtrace_state* backtraceState, const size_t moduleIndex, const uintptr_t addressStart, const uintptr_t addressEnd) { m_modules.emplace_back(addressStart, addressEnd, backtraceState, moduleIndex); m_modulesDirty = true; } void clearModules() { // TODO: optimize this, reuse modules that are still valid m_modules.clear(); m_modulesDirty = true; } size_t addIp(const uintptr_t instructionPointer) { if (!instructionPointer) { return 0; } auto it = m_encounteredIps.find(instructionPointer); if (it != m_encounteredIps.end()) { return it->second; } const size_t ipId = m_encounteredIps.size() + 1; m_encounteredIps.insert(it, make_pair(instructionPointer, ipId)); const auto ip = resolve(instructionPointer); fprintf(stdout, "i %zx %zx", instructionPointer, ip.moduleIndex); if (ip.functionIndex || ip.fileIndex) { fprintf(stdout, " %zx", ip.functionIndex); if (ip.fileIndex) { fprintf(stdout, " %zx %x", ip.fileIndex, ip.line); } } fputc('\n', stdout); return ipId; } /** * Prevent the same file from being initialized multiple times. * This drastically cuts the memory consumption down */ backtrace_state* findBacktraceState(const char* fileName, uintptr_t addressStart) { if (boost::algorithm::starts_with(fileName, "linux-vdso.so")) { // prevent warning, since this will always fail return nullptr; } auto it = m_backtraceStates.find(fileName); if (it != m_backtraceStates.end()) { return it->second; } struct CallbackData { const char* fileName; }; CallbackData data = {fileName}; auto errorHandler = [](void* rawData, const char* msg, int errnum) { auto data = reinterpret_cast(rawData); cerr << "Failed to create backtrace state for module " << data->fileName << ": " << msg << " / " << strerror(errnum) << " (error code " << errnum << ")" << endl; }; auto state = backtrace_create_state(fileName, /* we are single threaded, so: not thread safe */ false, errorHandler, &data); if (state) { const int descriptor = backtrace_open(fileName, errorHandler, &data, nullptr); if (descriptor >= 1) { int foundSym = 0; int foundDwarf = 0; auto ret = elf_add(state, descriptor, addressStart, errorHandler, &data, &state->fileline_fn, &foundSym, &foundDwarf, false); if (ret && foundSym) { state->syminfo_fn = &elf_syminfo; } } } m_backtraceStates.insert(it, make_pair(fileName, state)); return state; } private: vector m_modules; unordered_map m_backtraceStates; bool m_modulesDirty = false; unordered_map m_internedData; unordered_map m_encounteredIps; }; } int main(int /*argc*/, char** /*argv*/) { // optimize: we only have a single thread ios_base::sync_with_stdio(false); __fsetlocking(stdout, FSETLOCKING_BYCALLER); __fsetlocking(stdin, FSETLOCKING_BYCALLER); AccumulatedTraceData data; LineReader reader; string exe; PointerMap ptrToIndex; uint64_t lastPtr = 0; AllocationInfoSet allocationInfos; uint64_t allocations = 0; uint64_t leakedAllocations = 0; uint64_t temporaryAllocations = 0; while (reader.getLine(cin)) { if (reader.mode() == 'x') { reader >> exe; } else if (reader.mode() == 'm') { string fileName; reader >> fileName; if (fileName == "-") { data.clearModules(); } else { if (fileName == "x") { fileName = exe; } const char* internedString = nullptr; const auto moduleIndex = data.intern(fileName, &internedString); uintptr_t addressStart = 0; if (!(reader >> addressStart)) { cerr << "failed to parse line: " << reader.line() << endl; return 1; } auto state = data.findBacktraceState(internedString, addressStart); uintptr_t vAddr = 0; uintptr_t memSize = 0; while ((reader >> vAddr) && (reader >> memSize)) { data.addModule(state, moduleIndex, addressStart + vAddr, addressStart + vAddr + memSize); } } } else if (reader.mode() == 't') { uintptr_t instructionPointer = 0; size_t parentIndex = 0; if (!(reader >> instructionPointer) || !(reader >> parentIndex)) { cerr << "failed to parse line: " << reader.line() << endl; return 1; } // ensure ip is encountered const auto ipId = data.addIp(instructionPointer); // trace point, map current output index to parent index fprintf(stdout, "t %zx %zx\n", ipId, parentIndex); } else if (reader.mode() == '+') { ++allocations; ++leakedAllocations; uint64_t size = 0; TraceIndex traceId; uint64_t ptr = 0; if (!(reader >> size) || !(reader >> traceId.index) || !(reader >> ptr)) { cerr << "failed to parse line: " << reader.line() << endl; continue; } AllocationIndex index; if (allocationInfos.add(size, traceId, &index)) { fprintf(stdout, "a %" PRIx64 " %x\n", size, traceId.index); } ptrToIndex.addPointer(ptr, index); lastPtr = ptr; fprintf(stdout, "+ %x\n", index.index); } else if (reader.mode() == '-') { - --leakedAllocations; uint64_t ptr = 0; if (!(reader >> ptr)) { cerr << "failed to parse line: " << reader.line() << endl; continue; } bool temporary = lastPtr == ptr; lastPtr = 0; auto allocation = ptrToIndex.takePointer(ptr); if (!allocation.second) { continue; } fprintf(stdout, "- %x\n", allocation.first.index); if (temporary) { ++temporaryAllocations; } + --leakedAllocations; } else { fputs(reader.line().c_str(), stdout); fputc('\n', stdout); } } fprintf(stderr, "heaptrack stats:\n" "\tallocations: \t%" PRIu64 "\n" "\tleaked allocations: \t%" PRIu64 "\n" "\ttemporary allocations:\t%" PRIu64 "\n", allocations, leakedAllocations, temporaryAllocations); return 0; } diff --git a/src/track/heaptrack.sh.cmake b/src/track/heaptrack.sh.cmake index e015a92..e4b61de 100755 --- a/src/track/heaptrack.sh.cmake +++ b/src/track/heaptrack.sh.cmake @@ -1,187 +1,199 @@ #!/bin/sh # # Copyright 2014-2017 Milian Wolff # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU Library General Public License as # published by the Free Software Foundation; either version 2 of the # License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public # License along with this program; if not, write to the # Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. # usage() { echo "Usage: $0 [--debug|-d] DEBUGGEE [ARGUMENT]..." echo "or: $0 [--debug|-d] -p PID" echo echo "A heap memory usage profiler. It uses LD_PRELOAD to track all" echo "calls to the core memory allocation functions and logs these" echo "occurrances. Additionally, backtraces are obtained and logged." echo "Combined this can give interesting answers to questions such as:" echo echo " * How much heap memory is my application using?" echo " * Where is heap memory being allocated, and how often?" echo " * How much space are heap individual allocations requesting?" echo echo "To evaluate the generated heaptrack data, use heaptrack_print or heaptrack_gui." echo echo "Mandatory arguments to heaptrack:" echo " DEBUGGEE The name or path to the application that should" echo " be run with heaptrack analyzation enabled." echo echo "Alternatively, to attach to a running process:" echo " -p, --pid PID The process ID of a running process into which" echo " heaptrack will be injected. This only works with" echo " applications that already link against libdl." + echo " WARNING: Runtime-attaching heaptrack is UNSTABLE and can lead to CRASHES" + echo " in your application, especially after you detach heaptrack again." + echo " You are hereby warned, use it at your own risk!" echo echo "Optional arguments to heaptrack:" echo " -d, --debug Run the debuggee in GDB and heaptrack." echo " ARGUMENT Any number of arguments that will be passed verbatim" echo " to the debuggee." echo " -h, --help Show this help message and exit." echo " -v, --version Displays version information." echo exit 0 } debug= pid= client= while true; do case "$1" in "-d" | "--debug") debug=1 shift 1 ;; "-h" | "--help") usage exit 0 ;; "-p" | "--pid") pid=$2 if [ -z "$pid" ]; then echo "Missing PID argument." exit 1 fi client=$(cat /proc/$pid/comm) if [ -z "$client" ]; then echo "Cannot attach to unknown process with PID $pid." exit 1 fi shift 2 echo $@ if [ ! -z "$@" ]; then echo "You cannot specify a debuggee and a pid at the same time." exit 1 fi break ;; "-v" | "--version") echo "heaptrack @HEAPTRACK_VERSION_MAJOR@.@HEAPTRACK_VERSION_MINOR@.@HEAPTRACK_VERSION_PATCH@" exit 0 ;; *) if [ "$1" = "--" ]; then shift 1 fi if [ ! -x "$(which "$1" 2> /dev/null)" ]; then echo "Error: Debuggee \"$1\" is not an executable." echo echo "Usage: $0 [--debug|-d] [--help|-h] DEBUGGEE [ARGS...]" exit 1 fi client="$1" shift 1 break ;; esac done # put output into current pwd output=$(pwd)/heaptrack.$(basename "$client").$$ # find preload library and interpreter executable using relative paths EXE_PATH=$(readlink -f $(dirname $(readlink -f $0))) LIB_REL_PATH="@LIB_REL_PATH@" LIBEXEC_REL_PATH="@LIBEXEC_REL_PATH@" INTERPRETER="$EXE_PATH/$LIBEXEC_REL_PATH/heaptrack_interpret" if [ ! -f "$INTERPRETER" ]; then echo "Could not find heaptrack interpreter executable: $INTERPRETER" exit 1 fi INTERPRETER=$(readlink -f "$INTERPRETER") LIBHEAPTRACK_PRELOAD="$EXE_PATH/$LIB_REL_PATH/libheaptrack_preload.so" if [ ! -f "$LIBHEAPTRACK_PRELOAD" ]; then echo "Could not find heaptrack preload library $LIBHEAPTRACK_PRELOAD" exit 1 fi LIBHEAPTRACK_PRELOAD=$(readlink -f "$LIBHEAPTRACK_PRELOAD") LIBHEAPTRACK_INJECT="$EXE_PATH/$LIB_REL_PATH/libheaptrack_inject.so" if [ ! -f "$LIBHEAPTRACK_INJECT" ]; then echo "Could not find heaptrack inject library $LIBHEAPTRACK_INJECT" exit 1 fi LIBHEAPTRACK_INJECT=$(readlink -f "$LIBHEAPTRACK_INJECT") # setup named pipe to read data from pipe=/tmp/heaptrack_fifo$$ mkfifo $pipe # interpret the data and compress the output on the fly output="$output.gz" "$INTERPRETER" < $pipe | gzip -c > "$output" & debuggee=$! cleanup() { + if [ ! -z "$pid" ]; then + echo "removing heaptrack injection via GDB, this might take some time..." + gdb --batch-silent -n -iex="set auto-solib-add off" -p $pid \ + --eval-command="sharedlibrary libheaptrack_inject" \ + --eval-command="call heaptrack_stop()" \ + --eval-command="detach" + # NOTE: we do not call dlclose here, as that has the tendency to trigger + # crashes in the debuggee. So instead, we keep heaptrack loaded. + fi rm -f "$pipe" kill "$debuggee" 2> /dev/null echo "Heaptrack finished! Now run the following to investigate the data:" echo if [ "$(which heaptrack_gui 2> /dev/null)" != "" ]; then echo " heaptrack_gui \"$output\"" else echo " heaptrack_print \"$output\" | less" fi } trap cleanup EXIT echo "heaptrack output will be written to \"$output\"" if [ -z "$debug" ] && [ -z "$pid" ]; then echo "starting application, this might take some time..." LD_PRELOAD=$LIBHEAPTRACK_PRELOAD${LD_PRELOAD:+:$LD_PRELOAD} DUMP_HEAPTRACK_OUTPUT="$pipe" "$client" "$@" else if [ -z "$pid" ]; then echo "starting application in GDB, this might take some time..." gdb --eval-command="set environment LD_PRELOAD=$LIBHEAPTRACK_PRELOAD" \ --eval-command="set environment DUMP_HEAPTRACK_OUTPUT=$pipe" \ --eval-command="run" --args "$client" "$@" else echo "injecting heaptrack into application via GDB, this might take some time..." gdb --batch-silent -n -iex="set auto-solib-add off" -p $pid \ --eval-command="sharedlibrary libc.so" \ --eval-command="call (void) __libc_dlopen_mode(\"$LIBHEAPTRACK_INJECT\", 0x80000000 | 0x002)" \ --eval-command="sharedlibrary libheaptrack_inject" \ --eval-command="call (void) heaptrack_inject(\"$pipe\")" \ --eval-command="detach" echo "injection finished" fi fi wait $debuggee # kate: hl Bash diff --git a/src/track/libheaptrack.cpp b/src/track/libheaptrack.cpp index 20e4383..b21b00b 100644 --- a/src/track/libheaptrack.cpp +++ b/src/track/libheaptrack.cpp @@ -1,637 +1,651 @@ /* * Copyright 2014-2017 Milian Wolff * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Library General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this program; if not, write to the * Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ /** * @file libheaptrack.cpp * * @brief Collect raw heaptrack data by overloading heap allocation functions. */ #include "libheaptrack.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include "tracetree.h" #include "util/config.h" #include "util/libunwind_config.h" /** * uncomment this to get extended debug code for known pointers * there are still some malloc functions I'm missing apparently, * related to TLS and such I guess */ // #define DEBUG_MALLOC_PTRS using namespace std; namespace { enum DebugVerbosity { NoDebugOutput, MinimalOutput, VerboseOutput, VeryVerboseOutput, }; // change this to add more debug output to stderr constexpr const DebugVerbosity s_debugVerbosity = NoDebugOutput; /** * Call this to optionally show debug information but give the compiler * a hand in removing it all if debug output is disabled. */ template inline void debugLog(const char fmt[], Args... args) { if (debugLevel <= s_debugVerbosity) { flockfile(stderr); fprintf(stderr, "heaptrack debug [%d]: ", static_cast(debugLevel)); fprintf(stderr, fmt, args...); fputc('\n', stderr); funlockfile(stderr); } } /** * Set to true in an atexit handler. In such conditions, the stop callback * will not be called. */ atomic s_atexit{false}; +/** + * Set to true in heaptrack_stop, when s_atexit was not yet set. In such conditions, + * we always fully unload and cleanup behind ourselves + */ +atomic s_forceCleanup{false}; + /** * A per-thread handle guard to prevent infinite recursion, which should be * acquired before doing any special symbol handling. */ struct RecursionGuard { RecursionGuard() : wasLocked(isActive) { isActive = true; } ~RecursionGuard() { isActive = wasLocked; } const bool wasLocked; static thread_local bool isActive; }; thread_local bool RecursionGuard::isActive = false; void writeVersion(FILE* out) { fprintf(out, "v %x %x\n", HEAPTRACK_VERSION, HEAPTRACK_FILE_FORMAT_VERSION); } void writeExe(FILE* out) { const int BUF_SIZE = 1023; char buf[BUF_SIZE + 1]; ssize_t size = readlink("/proc/self/exe", buf, BUF_SIZE); if (size > 0 && size < BUF_SIZE) { buf[size] = 0; fprintf(out, "x %s\n", buf); } } void writeCommandLine(FILE* out) { fputc('X', out); const int BUF_SIZE = 4096; char buf[BUF_SIZE + 1]; auto fd = open("/proc/self/cmdline", O_RDONLY); int bytesRead = read(fd, buf, BUF_SIZE); char* end = buf + bytesRead; for (char* p = buf; p < end;) { fputc(' ', out); fputs(p, out); while (*p++) ; // skip until start of next 0-terminated section } close(fd); fputc('\n', out); } void writeSystemInfo(FILE* out) { fprintf(out, "I %lx %lx\n", sysconf(_SC_PAGESIZE), sysconf(_SC_PHYS_PAGES)); } FILE* createFile(const char* fileName) { string outputFileName; if (fileName) { outputFileName.assign(fileName); } if (outputFileName == "-" || outputFileName == "stdout") { debugLog("%s", "will write to stdout"); return stdout; } else if (outputFileName == "stderr") { debugLog("%s", "will write to stderr"); return stderr; } if (outputFileName.empty()) { // env var might not be set when linked directly into an executable outputFileName = "heaptrack.$$"; } boost::replace_all(outputFileName, "$$", to_string(getpid())); auto out = fopen(outputFileName.c_str(), "w"); debugLog("will write to %s/%p\n", outputFileName.c_str(), out); // we do our own locking, this speeds up the writing significantly __fsetlocking(out, FSETLOCKING_BYCALLER); return out; } /** * Thread-Safe heaptrack API * * The only critical section in libheaptrack is the output of the data, * dl_iterate_phdr * calls, as well as initialization and shutdown. * * This uses a spinlock, instead of a std::mutex, as the latter can lead to * deadlocks * on destruction. The spinlock is "simple", and OK to only guard the small * sections. */ class HeapTrack { public: HeapTrack(const RecursionGuard& /*recursionGuard*/) : HeapTrack([] { return true; }) { } ~HeapTrack() { debugLog("%s", "releasing lock"); s_locked.store(false, memory_order_release); } void initialize(const char* fileName, heaptrack_callback_t initBeforeCallback, heaptrack_callback_initialized_t initAfterCallback, heaptrack_callback_t stopCallback) { debugLog("initializing: %s", fileName); if (s_data) { debugLog("%s", "already initialized"); return; } if (initBeforeCallback) { debugLog("%s", "calling initBeforeCallback"); initBeforeCallback(); debugLog("%s", "done calling initBeforeCallback"); } // do some once-only initializations static once_flag once; call_once(once, [] { debugLog("%s", "doing once-only initialization"); // configure libunwind for better speed if (unw_set_caching_policy(unw_local_addr_space, UNW_CACHE_PER_THREAD)) { fprintf(stderr, "WARNING: Failed to enable per-thread libunwind caching.\n"); } #ifdef unw_set_cache_size if (unw_set_cache_size(unw_local_addr_space, 1024, 0)) { fprintf(stderr, "WARNING: Failed to set libunwind cache size.\n"); } #endif // do not trace forked child processes // TODO: make this configurable pthread_atfork(&prepare_fork, &parent_fork, &child_fork); atexit([]() { + if (s_forceCleanup) { + return; + } debugLog("%s", "atexit()"); s_atexit.store(true); heaptrack_stop(); }); }); FILE* out = createFile(fileName); if (!out) { fprintf(stderr, "ERROR: Failed to open heaptrack output file: %s\n", fileName); if (stopCallback) { stopCallback(); } return; } writeVersion(out); writeExe(out); writeCommandLine(out); writeSystemInfo(out); s_data = new LockedData(out, stopCallback); if (initAfterCallback) { debugLog("%s", "calling initAfterCallback"); initAfterCallback(out); debugLog("%s", "calling initAfterCallback done"); } debugLog("%s", "initialization done"); } void shutdown() { if (!s_data) { return; } debugLog("%s", "shutdown()"); writeTimestamp(); writeRSS(); // NOTE: we leak heaptrack data on exit, intentionally // This way, we can be sure to get all static deallocations. - if (!s_atexit) { + if (!s_atexit || s_forceCleanup) { delete s_data; s_data = nullptr; } debugLog("%s", "shutdown() done"); } void invalidateModuleCache() { if (!s_data) { return; } s_data->moduleCacheDirty = true; } void writeTimestamp() { if (!s_data || !s_data->out) { return; } auto elapsed = chrono::duration_cast(clock::now() - s_data->start); debugLog("writeTimestamp(%" PRIx64 ")", elapsed.count()); if (fprintf(s_data->out, "c %" PRIx64 "\n", elapsed.count()) < 0) { writeError(); return; } } void writeRSS() { if (!s_data || !s_data->out || !s_data->procStatm) { return; } // read RSS in pages from statm, then rewind for next read size_t rss = 0; fscanf(s_data->procStatm, "%*x %zx", &rss); rewind(s_data->procStatm); // TODO: compare to rusage.ru_maxrss (getrusage) to find "real" peak? // TODO: use custom allocators with known page sizes to prevent tainting // the RSS numbers with heaptrack-internal data if (fprintf(s_data->out, "R %zx\n", rss) < 0) { writeError(); return; } } void handleMalloc(void* ptr, size_t size, const Trace& trace) { if (!s_data || !s_data->out) { return; } updateModuleCache(); const auto index = s_data->traceTree.index(trace, s_data->out); #ifdef DEBUG_MALLOC_PTRS auto it = s_data->known.find(ptr); assert(it == s_data->known.end()); s_data->known.insert(ptr); #endif if (fprintf(s_data->out, "+ %zx %x %" PRIxPTR "\n", size, index, reinterpret_cast(ptr)) < 0) { writeError(); return; } } void handleFree(void* ptr) { if (!s_data || !s_data->out) { return; } #ifdef DEBUG_MALLOC_PTRS auto it = s_data->known.find(ptr); assert(it != s_data->known.end()); s_data->known.erase(it); #endif if (fprintf(s_data->out, "- %" PRIxPTR "\n", reinterpret_cast(ptr)) < 0) { writeError(); return; } } private: static int dl_iterate_phdr_callback(struct dl_phdr_info* info, size_t /*size*/, void* data) { auto heaptrack = reinterpret_cast(data); const char* fileName = info->dlpi_name; if (!fileName || !fileName[0]) { fileName = "x"; } debugLog("dlopen_notify_callback: %s %zx", fileName, info->dlpi_addr); if (fprintf(heaptrack->s_data->out, "m %s %zx", fileName, info->dlpi_addr) < 0) { heaptrack->writeError(); return 1; } for (int i = 0; i < info->dlpi_phnum; i++) { const auto& phdr = info->dlpi_phdr[i]; if (phdr.p_type == PT_LOAD) { if (fprintf(heaptrack->s_data->out, " %zx %zx", phdr.p_vaddr, phdr.p_memsz) < 0) { heaptrack->writeError(); return 1; } } } if (fputc('\n', heaptrack->s_data->out) == EOF) { heaptrack->writeError(); return 1; } return 0; } static void prepare_fork() { debugLog("%s", "prepare_fork()"); // don't do any custom malloc handling while inside fork RecursionGuard::isActive = true; } static void parent_fork() { debugLog("%s", "parent_fork()"); // the parent process can now continue its custom malloc tracking RecursionGuard::isActive = false; } static void child_fork() { debugLog("%s", "child_fork()"); // but the forked child process cleans up itself // this is important to prevent two processes writing to the same file s_data = nullptr; RecursionGuard::isActive = true; } void updateModuleCache() { if (!s_data || !s_data->out || !s_data->moduleCacheDirty) { return; } debugLog("%s", "updateModuleCache()"); if (fputs("m -\n", s_data->out) == EOF) { writeError(); return; } dl_iterate_phdr(&dl_iterate_phdr_callback, this); s_data->moduleCacheDirty = false; } void writeError() { debugLog("write error %d/%s", errno, strerror(errno)); s_data->out = nullptr; shutdown(); } template HeapTrack(AdditionalLockCheck lockCheck) { debugLog("%s", "acquiring lock"); while (s_locked.exchange(true, memory_order_acquire) && lockCheck()) { this_thread::sleep_for(chrono::microseconds(1)); } debugLog("%s", "lock acquired"); } using clock = chrono::steady_clock; struct LockedData { LockedData(FILE* out, heaptrack_callback_t stopCallback) : out(out) , stopCallback(stopCallback) { debugLog("%s", "constructing LockedData"); procStatm = fopen("/proc/self/statm", "r"); if (!procStatm) { fprintf(stderr, "WARNING: Failed to open /proc/self/statm for reading.\n"); } timerThread = thread([&]() { RecursionGuard::isActive = true; debugLog("%s", "timer thread started"); while (!stopTimerThread) { // TODO: make interval customizable this_thread::sleep_for(chrono::milliseconds(10)); HeapTrack heaptrack([&] { return !stopTimerThread.load(); }); if (!stopTimerThread) { heaptrack.writeTimestamp(); heaptrack.writeRSS(); } } }); } ~LockedData() { debugLog("%s", "destroying LockedData"); stopTimerThread = true; if (timerThread.joinable()) { try { timerThread.join(); } catch (std::system_error) { } } if (out) { fclose(out); } if (procStatm) { fclose(procStatm); } - if (stopCallback && !s_atexit) { + if (stopCallback && (!s_atexit || s_forceCleanup)) { stopCallback(); } debugLog("%s", "done destroying LockedData"); } /** * Note: We use the C stdio API here for performance reasons. * Esp. in multi-threaded environments this is much faster * to produce non-per-line-interleaved output. */ FILE* out = nullptr; /// /proc/self/statm file stream to read RSS value from FILE* procStatm = nullptr; /** * Calls to dlopen/dlclose mark the cache as dirty. * When this happened, all modules and their section addresses * must be found again via dl_iterate_phdr before we output the * next instruction pointer. Otherwise, heaptrack_interpret might * encounter IPs of an unknown/invalid module. */ bool moduleCacheDirty = true; TraceTree traceTree; const chrono::time_point start = clock::now(); atomic stopTimerThread{false}; thread timerThread; heaptrack_callback_t stopCallback = nullptr; #ifdef DEBUG_MALLOC_PTRS unordered_set known; #endif }; static atomic s_locked; static LockedData* s_data; }; atomic HeapTrack::s_locked{false}; HeapTrack::LockedData* HeapTrack::s_data{nullptr}; } extern "C" { void heaptrack_init(const char* outputFileName, heaptrack_callback_t initBeforeCallback, heaptrack_callback_initialized_t initAfterCallback, heaptrack_callback_t stopCallback) { RecursionGuard guard; debugLog("heaptrack_init(%s)", outputFileName); HeapTrack heaptrack(guard); heaptrack.initialize(outputFileName, initBeforeCallback, initAfterCallback, stopCallback); } void heaptrack_stop() { RecursionGuard guard; debugLog("%s", "heaptrack_stop()"); HeapTrack heaptrack(guard); + + if (!s_atexit) { + s_forceCleanup.store(true); + } + heaptrack.shutdown(); } void heaptrack_malloc(void* ptr, size_t size) { if (ptr && !RecursionGuard::isActive) { RecursionGuard guard; debugLog("heaptrack_malloc(%p, %zu)", ptr, size); Trace trace; trace.fill(2 + HEAPTRACK_DEBUG_BUILD); HeapTrack heaptrack(guard); heaptrack.handleMalloc(ptr, size, trace); } } void heaptrack_free(void* ptr) { if (ptr && !RecursionGuard::isActive) { RecursionGuard guard; debugLog("heaptrack_free(%p)", ptr); HeapTrack heaptrack(guard); heaptrack.handleFree(ptr); } } void heaptrack_realloc(void* ptr_in, size_t size, void* ptr_out) { if (ptr_out && !RecursionGuard::isActive) { RecursionGuard guard; debugLog("heaptrack_realloc(%p, %zu, %p)", ptr_in, size, ptr_out); Trace trace; trace.fill(2 + HEAPTRACK_DEBUG_BUILD); HeapTrack heaptrack(guard); if (ptr_in) { heaptrack.handleFree(ptr_in); } heaptrack.handleMalloc(ptr_out, size, trace); } } void heaptrack_invalidate_module_cache() { RecursionGuard guard; debugLog("%s", "heaptrack_invalidate_module_cache()"); HeapTrack heaptrack(guard); heaptrack.invalidateModuleCache(); } }