[Experiment] [Hack] Remove heap allocation in KisPaintInformation
AbandonedPublic

Authored by alvinhochun on Jul 6 2017, 6:29 AM.

Details

Reviewers
None
Group Reviewers
Krita
Summary

Removes heap allocation in KisPaintInformation.

This is a hacky proof-of-concept experiment so don't judge it too hard :P

TODO: Compare the performance with original, do some profiling. e.g. normal painting / painting with stabilizer / other stuff

I did a quick profiling with the original code drawing with stabilizer and I found malloc and free related to this class taking a lot of time.

Problems: with this approach

  1. Brittle
  2. space wasted when Private is smaller than the reserved size
  3. Still need to change header file whenever new fields are added which cause the size to exceed the reserved size
  4. Why do I need const_cast???
  5. Alignment? The original class has EIGEN_MAKE_ALIGNED_OPERATOR_NEW but I don't see any Eigen classes used as member field.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
alvinhochun created this revision.Jul 6 2017, 6:29 AM
Restricted Application added a subscriber: woltherav. · View Herald TranscriptJul 6 2017, 6:29 AM

I just realized I forgot to call the destructor...

alvinhochun updated this revision to Diff 16226.Jul 6 2017, 6:44 AM

Call the destructor

Hi, @alvinhochun!

Could you please attach the results for profiling before and after this change? When I tested the thing I could never see this problem in profiling logs. Probably, we should fix it in some other place? Like, check if we pass paint information by const reference all the time...

Hardcoded memory allocation looks a bit frightening, even though it seems to be guarded by static asserts...

Why do I need const_cast???

I would check why const is added to setLevelOfDetail(). Looks like a bug.

Alignment? The original class has EIGEN_MAKE_ALIGNED_OPERATOR_NEW but I don't see any Eigen classes used as member field.

I must be some "optimization", though I have never seen any proof for it.

rempt added a subscriber: rempt.Jul 6 2017, 11:25 AM
> Alignment? The original class has EIGEN_MAKE_ALIGNED_OPERATOR_NEW but I don't see any Eigen classes used as member field.

I must be some "optimization", though I have never seen any proof for it.

No, it has nothing to do with optimizations. In 2008, KisPaintInformation contained a

KisVector2D movement;

member; this was a vectorized eigen object, and that needed alignment. We no longer
have that member, so we don't need this macro anymore.

Hi, @alvinhochun!

Could you please attach the results for profiling before and after this change? When I tested the thing I could never see this problem in profiling logs. Probably, we should fix it in some other place? Like, check if we pass paint information by const reference all the time...

Well I don't know, I'll need to find a way to reliably reproduce the same brush strokes, perhaps by simulating mouse movements with autohotkey (windows).

For my existing rough captures of 10 seconds of drawing with stabilizer:

  • a *blocking event loop wait call* and a *graphics driver vsync call* take around 20%~30% of time in total;
  • the malloc calls in KisPaintInformation takes about 8% and free related to it takes 5% or more time in total.

But these are just results of me randomly flickering the pen when profiling. And using (I think) a pretty lightweight brush.

It says most of these allocations are done within KisPaintInformation::getStabilizedPaintInfo so optimizing that function might make a bit more sense.

I'm using Very Sleepy (the latest stable release from 2014) to profile it.

Hardcoded memory allocation looks a bit frightening, even though it seems to be guarded by static asserts...

A bit frightening? I'd call that a complete mess! :D
I wouldn't want to commit something like this to any codebases... unless that codebase is a "bad code collection"!
Nevertheless I wanted to see if this would cause even a slightly noticeable performance increase.

The const keyword and EIGEN_MAKE_ALIGNED_OPERATOR_NEW should probably be changed. I'll check those later.

I'm using Very Sleepy (the latest stable release from 2014) to profile it.

There used to be some community version of Intel VTune for non-commercial work. You could try to check for it :)

Hardcoded memory allocation looks a bit frightening, even though it seems to be guarded by static asserts...

A bit frightening? I'd call that a complete mess! :D

Well, I tried to be polite :P

Regarding "making the exact same brushstrokes", @scottpetrovic had found a solution to use when he was optimising brushes for windows.

for making brushes the exact same way, I was just using a 3rd party program that records your mouse movements and can replay them. The one I was last using was. http://www.macrocreator.com/

I didn't play around much with the macro recorder in Krita, so that might be able to work too.

Profiling of random drawing for 20 seconds with stabilizer enabled, on master (b0999c9) with D6534 applied, this patch *not* applied:

Just to get an idea.

If the goal is to prevent heap allocation, wouldn't it be easier to simply get rid of the PIMPL idiom for this class entirely and move the contents of KisPaintInformation::Private out into KisPaintInformation?

Between these two options:

  1. finicky and scary manual memory allocation, or
  2. a slightly larger header file plus a warning in the comments saying "we would rather hide these variables in a d_ptr but can't for the sake of speed, so please don't touch,"

it seems to me option 2 is preferable.

If the goal is to prevent heap allocation, wouldn't it be easier to simply get rid of the PIMPL idiom for this class entirely and move the contents of KisPaintInformation::Private out into KisPaintInformation?

I believe the original intention was to keep binary compatibility whenever the internal data fields are changed. Though it also increases the recompilation speed by a lot. But tbh how often are these fields changed?
I'm not sure if binary compatibility is really needed. I don't think there are any third-party plugins for Krita written in C++, are there? And we are moving to Python scripting anyway.

Between these two options:

  1. finicky and scary manual memory allocation, or
  2. a slightly larger header file plus a warning in the comments saying "we would rather hide these variables in a d_ptr but can't for the sake of speed, so please don't touch,"

    it seems to me option 2 is preferable.

Or we can also make use of a fixed-size allocator for it (I have no experience with this though). And if it's fast enough then perhaps we get to keep the PIMPL idiom and its benefits which perhaps are not really useful most of the time...

alvinhochun planned changes to this revision.Jul 8 2017, 6:33 PM

Doesn't apply cleanly on master

abrahams added a comment.EditedJul 8 2017, 8:24 PM

If the goal is to prevent heap allocation, wouldn't it be easier to simply get rid of the PIMPL idiom for this class entirely and move the contents of KisPaintInformation::Private out into KisPaintInformation?

I believe the original intention was to keep binary compatibility whenever the internal data fields are changed. Though it also increases the recompilation speed by a lot. But tbh how often are these fields changed?
I'm not sure if binary compatibility is really needed. I don't think there are any third-party plugins for Krita written in C++, are there? And we are moving to Python scripting anyway.

Yes I think the main benefits for Krita now are compile times and code cleanliness. Opaque pointers are more useful in Qt and KDE frameworks which aim for some amount of ABI stability. Krita follows this as part of the Qt/KDE coding style, but ever since the Calligra split - thus no more shared libraries - the benefits have become pretty minor. (Not to say we should ignore compile times of course. But if it's just one header out of a few hundred?)

alvinhochun abandoned this revision.Jul 12 2017, 3:00 PM

Abandoning because:

  1. Too hacky
  2. Even though this eliminates the heap allocations, the data is still being copied everywhere. Implementing copy-on-write (e.g. QSharedData) should be a better way.
  3. The improvements made in D6578 makes this irrelevant.