Optimize Rectangular Gauss Mask Generator

Authored by vanyossi on Jun 21 2018, 6:18 AM.



This code implements the vectorized version of Rectangular Gauss Mask

Results of implementation

QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 1 Time: 113230 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 2 Time: 55631 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 3 Time: 50240 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 4 Time: 47741 (ms)

QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 1 Time: 12921 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 2 Time: 7101 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 3 Time: 8370 (ms)
QDEBUG : FreehandStrokeBenchmark::testRectGaussianTip() Cores: 4 Time: 6757 (ms)

RESULT : KisMaskGeneratorBenchmark::testRectangularGaussScalarMask():

143.36 msecs per iteration (total: 14,336, iterations: 100)

RESULT : KisMaskGeneratorBenchmark::testRectangularGaussVectorMask():

13.67 msecs per iteration (total: 1,368, iterations: 100)
Test Plan

Tested using similarity test on all configurations.

Profiled code shows no evident bottleneck on any operation. the code is arrangeed such as operations are done at most once. and genrated values are reused as much as possible. But it might be possible to squeeze a little more optimization.

Diff Detail

R37 Krita
Lint Skipped
Unit Tests Skipped
vanyossi created this revision.Jun 21 2018, 6:18 AM
Restricted Application added a reviewer: Krita. · View Herald TranscriptJun 21 2018, 6:18 AM
vanyossi requested review of this revision.Jun 21 2018, 6:18 AM
rempt added a comment.Jun 21 2018, 7:56 AM

Wow! That's some difference!

dkazakov requested changes to this revision.Jun 22 2018, 9:10 AM

Hi, @vanyossi!

You seem to have forgotten to commit a file: kis_gauss_rect_mask_generator_p.h :)
Neither the patch nor the branch compile :)

Codewise the patch looks fine. The only thing that worries me is that you have exported quite a lot of "private" parameters from KisAntialiasingFadeMaker, breaking its encapsulation (it has already started in the previous patches). Now its internal code duplicated in all the generators :(

Ideally, the fade maker should somehow be refactored, so the vectorized fading code would be reused in all the mask generators. I'm not sure if it is easy to do though. Let's not block this very patch by that, but please thing about reusing the fading code in all the generators after it.

To get this patch into master, please just upload the missing file. Refactoring can be done later in a separate patch.

This revision now requires changes to proceed.Jun 22 2018, 9:10 AM
vanyossi updated this revision to Diff 36540.Jun 23 2018, 12:49 AM

Fixed Diff

Code refactor is scheduled for this week.

however I did notice that the original implementation does not need the antialiasing fader as the soft edge value almost never trigger the fader. When it does, the values converted are in the QRect edge.

I run the code removing the fader antialias code entirely and the test is passed succesfully.

dkazakov requested changes to this revision.Jun 25 2018, 7:22 AM

Hi, @vanyossi!

I have tested your patch. It seem to work fine, but I still have a few doubts about the antialiasing fade maker:

  1. [just a thought] It seems like 100% fade in a rectangular Gaussian brush doesn't make it fully opaque. That seems like an inconsistent behavior, but I doubt we can do anything about it, because it was here for years :) That is actually the reason why you don't see any difference in your tests. The aliasing seem to be forces by the brush mask itself.
  1. [regression] Here is a testcase how to see the difference between your implementation and the baseline implementation:
  1. Select 'Basic-5 Size' brush
  2. Select Gaussian tip, Rectangular
  3. Set precision to 5
  4. Set Size to 2 px (!)
  5. Paint two strokes with antialiasing and not

You will see the difference between the two implementations:


your avx implementation

Your implementation seem to be a bit more smooth than the base one. That is good, but the two implementations should give exactly the same result. Otherwise we will get crazy answering bugreports about "not-smooth brushes" :)

So, please fix either the old implementation or the new one :)

This revision now requires changes to proceed.Jun 25 2018, 7:22 AM

That is strange because on sizes below 20px the brush mask generator uses the old rendering scalar engine, so it should be equal to the old version.

I tried to reproduce and found that if "spacing" is left to auto I can sometimes get your results, but only in 16-bit modes, In 8bit it behaves more predictable.

Turning off auto spacing gives consistent behaviour across versions, engines and bitdepths

The images show on the left 8-bit canvases, and on the right 16 bit canvases, both versions shows very similar strokes. Nevertheless I'll look into the conditions to see if everything is triggering as expected, just to be extra sure I don't introduce a regression :]

As an extra note, I did noticed that for 16-bit canvases the old model caused strange renderings for Gaussian (the easiest to test), and the vectorized rendering gives the correct result. Which possibly indicates there is a bug in the 16-bit rendering process. however this should not affect the stroke behavior as the rendering used for small brushes is still the Scalar one.

After some more testing I was able to reproduce it.

Hi Dmitry!

This issue seems to be generalized to all Gauss brushes using the applicator. I tested it on Gauss circular and I get the same differences. Im not sure what is happening as the vector process is not used in that size range. There must be something different a step behind or after, maybe on the processing of the data, but it is something that seems to affect all "KisBrushMaskApplicatorBase" users as the Default brush also behaves in equal ways.

Im going to look for the differences between both paths for Scalar creation routes. It is strange that it changes even if the function valueAt called is the same for calculated values.

vanyossi added a comment.EditedJun 28 2018, 1:09 AM


Now I know exactly what shouldSuperSample() does. I removed all supersample() calls from other brushes implementation as it is not needed.
I wonder why it is needed in the Default brush as It basically makes antialias checkbox unable to complely turn off antialias on brushes below 10px size.

The fix in action

vanyossi updated this revision to Diff 36921.Jun 29 2018, 11:09 PM


This Diff includes:

  • Implementation o RectGauss
  • Refactor of fadeMaker 1D, 2D
  • Bug fixes for for RectGAuss
  • Regression fix for all brushes (Gauss, RectGauss, Soft)
  • Refactor of Vc::Erf -> mover to a new file

Profiling shows refactoring does not affect performance.

Hi, @vanyossi!

Did you also change Circular gaussian mask? It look like it is broken in your branch now :(

Here are the test strokes for 2 and 4 px brushed for Gaussian Circle with 1.0 Fade and Auto Spacing set to 1.0:


your branch

At the same time, rectangular tip works perfectly fine in both branches and generates the same results.

Just a guess: I remember we had two types of fade makers: one for 1d and one for 2d case. And one of them was used for rectangular masks only. Perhaps it might be connected to the bug somehow?

dkazakov accepted this revision.Jul 2 2018, 2:41 PM

Hi, @vanyossi!

You were right, the patch works exactly the same way as Krita 4.0 works. I have also checked the code, it looks perfectly correct! Please merge it into master ;)

This revision is now accepted and ready to land.Jul 2 2018, 2:41 PM
vanyossi closed this revision.Oct 11 2018, 11:14 AM