Add converter from Linux 'perf record'
Needs ReviewPublic

Authored by lunakl on May 20 2019, 1:42 PM.

Details

Reviewers
weidendo
Summary

Originally from
https://github.com/milianw/linux/blob/milian/perf/tools/perf/scripts/python/callgrind.py
I only have made the addr2line usage faster.
After mailing with Milian, for him the script was just POC, he's developed
app called Hotspot aimed specifically at perf and he's not
interested in submitting it upstream. But I find the script useful
(and I really like KCachegrind's call graph), and it does the job for me,
so with his consent I'm submitting it.

Diff Detail

Repository
R49 KCacheGrind
Lint
Lint Skipped
Unit
Unit Tests Skipped
lunakl requested review of this revision.May 20 2019, 1:42 PM
lunakl created this revision.
mwolff added a subscriber: mwolff.May 20 2019, 3:22 PM

FTR: I did give my consent, so thanks @lunakl!

Hey,

I'm not a KDE developer so feel free to ignore my comments, but there are two issues for python 3:

  1. There is invalid whitespace in this file (python3 throws TabError for line 76ff).
  2. dict.iteritems() does not work in python 3, see https://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items. I replaced iteritems() by items().

Cheers
Bastian

How do you test it with python3? The script must be run using perf and here on openSUSE 15.0 that uses python2.

I'm on Arch Linux and '/usr/bin/python' is version 3.7.3. I don't know how the python version is determined in scripts run through 'perf script', unfortunately. Could be that they invoke /usr/bin/python (through #!/usr/bin/python?).

Anyway, after fixing the two issues mentioned above your script runs fine with python 3 here...

I'm on Arch Linux and '/usr/bin/python' is version 3.7.3. I don't know how the python version is determined in scripts run through 'perf script', unfortunately. Could be that they invoke /usr/bin/python (through #!/usr/bin/python?).

It doesn't change anything if I change here /usr/bin/python to point to python3, so I guess I can't change that. Can you simply post a fixed version in some form (I don't know if Phabricator lets you edit my patch)?

Here's my modified file.

lunakl updated this revision to Diff 58582.May 24 2019, 8:43 AM

Thank you. I've checked and this works with Python2 too without problems, so I've updated the patch to incorporate these fixes.

pino added a subscriber: pino.May 24 2019, 8:51 AM

A couple of notes:

  • can you please remove the .py extension? the other scripts do not have it
  • what about installing it, just like the other scripts?
  • since it is a new Python script, what about formatting it according to PEP5? (so 4 spaces indentation, 80 chars limit per line, etc)
khuman added a subscriber: khuman.Nov 23 2022, 8:48 PM

I gave this script a try using perf 5.15 (general tests) and on perf 4.18 (testing with "real applications").
My findings:

1 the perf data seems to create quite good results with reasonable numbers (compared with real callgrind) and stacktrace "per function" in callgrind

2 con: because addSample taking only the sym and from it the name, which then get post-resolved to an address via addr2line all function calls are stored to happen at the beginning of the function;
while this is enough for the general numbers and the callstack+callgraph, it removes the option to inspect the source; obviously it also removes the option to add a "dump instructions" option.

3 con: at least with perf 4.18 and executing via "perf script" the conversion took quite some time (4-8 minutes, not a big issue) and had a huge memory print:

VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   20.2g   6.5g 114272 R  97.7  20.9   1:14.87 perf

4 con (already has a TODO note in the script): the complete command line is missing; when inspecting the perf.data via perf or hotspot we get that, so I _think_ it must be available at some place of the script.
Can someone share insights how to do that?

To 2 - the biggest issue: Shouldn't it be possible to "somehow" calculate the instruction from the "ip" field in the callchain? Doing so would then allow to get the correct source line (and allow to add an option for dumping instructions).
How can the "ip" be rsolved to a real address, or how can we get that otherwise?

To 3: not sure, but it may be possible to "flush" some data to disk and get rid of objects earlier? Is there a python coder here that could be asked for performance improvements?