Add converter from Linux 'perf record'
Needs ReviewPublic
Actions

Authored by lunakl on May 20 2019, 1:42 PM.

Details

Reviewers

Summary

Originally from
https://github.com/milianw/linux/blob/milian/perf/tools/perf/scripts/python/callgrind.py
I only have made the addr2line usage faster.
After mailing with Milian, for him the script was just POC, he's developed
app called Hotspot aimed specifically at perf and he's not
interested in submitting it upstream. But I find the script useful
(and I really like KCachegrind's call graph), and it does the job for me,
so with his consent I'm submitting it.

Diff Detail

Repository

R49 KCacheGrind

Lint

Lint Skipped

Unit

Unit Tests Skipped

lunakl requested review of this revision.May 20 2019, 1:42 PM

lunakl created this revision.

FTR: I did give my consent, so thanks @lunakl!

Hey,

I'm not a KDE developer so feel free to ignore my comments, but there are two issues for python 3:

There is invalid whitespace in this file (python3 throws TabError for line 76ff).
dict.iteritems() does not work in python 3, see https://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items. I replaced iteritems() by items().

Cheers
Bastian

How do you test it with python3? The script must be run using perf and here on openSUSE 15.0 that uses python2.

I'm on Arch Linux and '/usr/bin/python' is version 3.7.3. I don't know how the python version is determined in scripts run through 'perf script', unfortunately. Could be that they invoke /usr/bin/python (through #!/usr/bin/python?).

Anyway, after fixing the two issues mentioned above your script runs fine with python 3 here...

In D21306#467715, @beischer wrote:

I'm on Arch Linux and '/usr/bin/python' is version 3.7.3. I don't know how the python version is determined in scripts run through 'perf script', unfortunately. Could be that they invoke /usr/bin/python (through #!/usr/bin/python?).

It doesn't change anything if I change here /usr/bin/python to point to python3, so I guess I can't change that. Can you simply post a fixed version in some form (I don't know if Phabricator lets you edit my patch)?

converters_perf2calltree_python3.py4 KBDownload

Here's my modified file.

Thank you. I've checked and this works with Python2 too without problems, so I've updated the patch to incorporate these fixes.

A couple of notes:

can you please remove the .py extension? the other scripts do not have it
what about installing it, just like the other scripts?
since it is a new Python script, what about formatting it according to PEP5? (so 4 spaces indentation, 80 chars limit per line, etc)

I gave this script a try using perf 5.15 (general tests) and on perf 4.18 (testing with "real applications").
My findings:

1 the perf data seems to create quite good results with reasonable numbers (compared with real callgrind) and stacktrace "per function" in callgrind

2 con: because addSample taking only the sym and from it the name, which then get post-resolved to an address via addr2line all function calls are stored to happen at the beginning of the function;
while this is enough for the general numbers and the callstack+callgraph, it removes the option to inspect the source; obviously it also removes the option to add a "dump instructions" option.

3 con: at least with perf 4.18 and executing via "perf script" the conversion took quite some time (4-8 minutes, not a big issue) and had a huge memory print:

VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   20.2g   6.5g 114272 R  97.7  20.9   1:14.87 perf

4 con (already has a TODO note in the script): the complete command line is missing; when inspecting the perf.data via perf or hotspot we get that, so I _think_ it must be available at some place of the script.
Can someone share insights how to do that?

To 2 - the biggest issue: Shouldn't it be possible to "somehow" calculate the instruction from the "ip" field in the callchain? Doing so would then allow to get the correct source line (and allow to add an option for dumping instructions).
How can the "ip" be rsolved to a real address, or how can we get that otherwise?

To 3: not sure, but it may be possible to "flush" some data to disk and get rid of objects earlier? Is there a python coder here that could be asked for performance improvements?

Revision Contents
Changeset List

			Path	Packages
M			converters/README (1 line)
A	M		converters/perf2calltree.py (170 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	58365		May 20 2019, 1:41 PM	★	★
Diff 2	58582		May 24 2019, 8:42 AM	★	★

Add converter from Linux 'perf record'Needs ReviewPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 58582

converters/README

converters/perf2calltree.py

Add converter from Linux 'perf record'
Needs ReviewPublic
Actions

Revision Contents
Changeset List