optimization of KTextEditor::DocumentPrivate::views()
ClosedPublic
Actions

Authored by jtamate on Apr 25 2018, 8:45 AM.

Details

Reviewers

cullmann
mwolff

Group Reviewers

Kate

Commits

R39:90157a222b5e: optimization of KTextEditor::DocumentPrivate::views()

Summary

Calculate the list of keys of a hash table is quite expensive.
It is cheaper to keep that list and change it when a view is created or removed.

Test Plan

Run kwrite and paste a relative long text several times and then undo until the text is empty, with only one view.
Before:

it used 65% of cpu instructions in Kate::TextBuffer::notifyAboutRangeChange
After:

it uses 19% of cpu instructions in in Kate::TextBuffer::notifyAboutRangeChange

Diff Detail

Repository

R39 KTextEditor

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

jtamate created this revision.Apr 25 2018, 8:45 AM

Restricted Application added projects: Kate, Frameworks. · View Herald TranscriptApr 25 2018, 8:45 AM

Restricted Application added a subscriber: Frameworks. · View Herald Transcript

jtamate requested review of this revision.Apr 25 2018, 8:45 AM

If you have benchmark results for that, I have no issues with the idea.
Thought I think you only need to update the cache in add/removeView, not in KTextEditor::DocumentPrivate::createView

cullmann requested changes to this revision.May 3 2018, 7:19 PM

This revision now requires changes to proceed.May 3 2018, 7:19 PM

You are right, there is no change to m_views there.

lgtm in general, but codewise can be improved

src/document/katedocument.cpp
2835	just add the view, cf. the contains check above. no need to rebuild the full list here all the time
2852	dito, just remove

This revision now requires changes to proceed.May 4 2018, 8:38 AM

Oh and again: please start using perf/hotspot instead of callgrind. Really, the performance numbers you get from callgrind are just *instructions*! It doesn't mean "65% of CPU". It means 65% of the instructions.

You are doing such good work with profiling and optimizing software, it would be so much better if you'd use the correct tools too!

Add and remove from the list.

About the tools I use to profile:

I'm used to use slow machines, I have no problems running under valgrind.
When I can run a program and I'm able to repeat the workload that produces high cpu usage or slowness, I do like to use callgrind/kcachegrind, just because I get inclusive costs.
When a program gets slow after some time running or under some unexpected circumstances, yes I use perf, because I do not need to restart the program and I can do live measurements.
But I do love the kcachegrind view of the % of instructions for every method within the source code.

In any case, the hotspot results pasting 10 times and undoing:
Before:

After:

But the hotspot screenshot clearly shows that you are spending time on optimizing things that are barely noticeable. You have optimized a function that consumes 0.3% of the CPU cycles. It now consumes only ~0.15%, at the cost of slightly higher memory consumption.

This is the real cost. The instruction numbers reported by callgrind paint a very different picture (22.6% vs. 5.5%). But that's not what actually matters, the cycle cost is way more important.

Also note that hotspot shows inclusive costs. You should get used to it, most notably have a look at the flamegraph and get acquainted with that way to visualize profiling costs.

Actually, no. Ignore what I said. The pictures you are showing are pretty meaningless. Did you run perf with --call-graph dwarf? Better look at the flamegraph and search for the function you are interested in (Kate::TextBuffer::notifyAboutRangeChange) or use the Caller/Callee view to get an aggregated view of your change.

In D12511#258193, @mwolff wrote:

Actually, no. Ignore what I said. The pictures you are showing are pretty meaningless. Did you run perf with --call-graph dwarf? Better look at the flamegraph and search for the function you are interested in (Kate::TextBuffer::notifyAboutRangeChange) or use the Caller/Callee view to get an aggregated view of your change.

That's not the impression I got running kwrite in live and under a slow machine (valgrind), before I had to wait for the paste to finish, and after the patch not. But kwrite is still slow undoing the paste.

I did run perf record -g, the same test with the patch applied after pasting the text 25 times and undoing with this result:

.
The same test (pasting 25 times and undoing) with --call-graph dwarf with this result:

, and

perf record -g produces unusable data files, since it relies on the frame pointer which is usually not available. Use perf record --call-graph dwarf instead. https://phabricator.kde.org/file/data/w4qogv4brtxlc5p5bnwr/PHID-FILE-q62giymcptudpl5m6bt3/kwrite_perf_after_25_dwarf_caller.png shows ~1.5% in notifyAboutRangeChange (inclusively). Is that before or after your patch here?

In D12511#258212, @mwolff wrote:

https://phabricator.kde.org/file/data/w4qogv4brtxlc5p5bnwr/PHID-FILE-q62giymcptudpl5m6bt3/kwrite_perf_after_25_dwarf_caller.png shows ~1.5% in notifyAboutRangeChange (inclusively). Is that before or after your patch here?

After applying the patch.
Before applying the patch:

OK, cool! That clearly shows that this patch _is_ valuable: Before we have ~6% CPU cycle cost, now it's down to 1.5% (inclusively). This is a significant reduction, so I'm all for it.

But note once again the stark difference in these numbers vs. what valgrind reports. Please, try to use perf record --call-graph dwarf + hotspot flamegraphs more in favor of callgrind/kcachegrind. Note that you can even use hotspot's record page for some more features, most notably the off-CPU profiling. This can give you more food for potential optimizations (finding sleep time, I/O, lock contention, ...).

Happy profiling!

This revision was not accepted when it landed; it landed in state Needs Review.May 4 2018, 5:40 PM

Closed by commit R39:90157a222b5e: optimization of KTextEditor::DocumentPrivate::views() (authored by jtamate). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			src/document/katedocument.h (6 lines)
M			src/document/katedocument.cpp (2 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	33047			Apr 25 2018, 8:13 AM	★	★
Diff 2	33574			May 3 2018, 8:01 PM	★	★
Diff 3	33613			May 4 2018, 9:53 AM	★	★
Diff 4	33644	7b6fe8c	R39:90157a222b5eab981549bcac74cc6890940039bd	May 4 2018, 5:40 PM	★	★

optimization of KTextEditor::DocumentPrivate::views()ClosedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 33644

src/document/katedocument.h

src/document/katedocument.cpp

optimization of KTextEditor::DocumentPrivate::views()
ClosedPublic
Actions

Revision Contents
Changeset List