Optimized COMPOSITE_COPY blending mode with AVX (make ColorSmudge brush 15% faster)
Closed, ResolvedPublic

Description

The project goal is to optimize compositing speed of "Copy" blending mode. This mode is heavily used in colorsmudge brush. Using AVX optimizations should make the blending mode 4-8 times faster, which would make the colorsmudge brush engine about 15% faster.

  1. Build Krita either in docker on Linux [0] or using script in krita source dir on Windows [1].
  1. Make sure that you have built unittests (enabled in docker by default):
cmake -DBUILD_TESTING=on .
make -j8 install
  1. Run the benchmark for colorsmudge, it should run at a speed of around 5700ms. After the project is done, the test should run at about 4700ms, that is about 15% faster.
.\bin\FreehandStrokeBenchmark.exe testColorsmudgeDefaultTip
  1. The reference implementation for the optimized blending mode is KoOptimizedCompositeOpOver32. This class optimizes KoCompositeOpOver.
  1. The goal of the project is to implement KoOptimizedCompositeOpCopy32 for KoCompositeOpCopy2 class in exactly the same way.
  1. The trick of KoOptimizedCompositeOpCopy32 is that it should have two implementations of the exactly same blending algorithm. One for compositeVector() and the other for compositeOnePixelScalar(). They should produce exactly the same values, because they are used in an interchangeable way for aligning the pixel data.

[0] - https://cgit.kde.org/scratch/dkazakov/krita-docker-env.git/tree/README.md
[1] - krita\build-tools\windows\build.cmd