Krita AVX optimization for brush mask generation: Gauss, Soft, Stamp, Rectangular Gauss, Rectangular Soft
Open, Needs TriagePublic

Description

GSoC project

Implement AVX instruction set to mask generation to accelerate painting performance on Circular Gauss, Circular soft, Rectangular Gauss, Rectangular Soft, Stamp brush masks.

BrushPorted to processorsImplemented Vc versionMerged into master
Default Circlexxx
Default Rectxxx
Soft Circlexxx
Soft Rectxxx
Gaussian Circlexxx
Gaussian Rectxxx
Stamp

Details

Differential Revisions
D13504: Krita SoftBrush AVX Mask generation Optim.
D14314: Default and Soft Rectangular mask generator Optimization
Restricted Differential Revision
Commits
R37:0b94a5f0ba25: Small refactor of mask similarity test to avoid repetition
R37:d4aaf03c18dd: Remove unused code from similarity test
R37:ff83ebb73821: Reduce difference gap of Default Rect Mask Vector impl
R37:da249312e649: Implement Vc Optimization for Default Rect Mask
R37:728cec98fba8: Adjust Spacing of auto_soft_rect.kpp test preset to be 0.1
R37:0a7d9b113791: Add Default Rectangular to FreehandStrokeBenchmark
R37:21afc59cea14: Use Vc Indexes instead of custom SimdArray for integer casting
R37:b8f2080917a3: Reduce casting on Vector Indexes creation
R37:2565a74bdc9c: Vectorize Soft Rect Mask Generator
D13646 / R37:65adda38f081: Add missing license header
D13646 / R37:66fad84c879f: Reduce repetition in exhaustive mask similarity tester
D13646 / R37:067e7f5df57f: Check for nan results on alphafactor
D13646 / R37:f3efbf17c4c5: Fix minor code-style
D13646 / R37:f77e393ca079: Fix regression on masks Soft and Gauss Rect for smaller sizes
D13646 / R37:64dbc99448e4: Fix regression over softness on Vector Gauss version
D13646 / R37:a5a30036cfbe: Refactor vectorized 2D FadeMaker
D13646 / R37:04e9a46aab82: Refactor vectorized antialising Fade Maker
R37:ef2d20710194: Vectorized Circular Soft Mask Generator
D13646 / R37:2e69cb1c4e28: Refactor Vc erf operator
D13646 / R37:d912b47ef9f2: Fixed: Optimize Rectangular Gaussian Mask
R37:7d08e5e4d2e6: Revert "Optimize Rectangular Gauss Mask"
R37:76e1022edab7: Use vector vOne floats instead of constant 1.0f
R37:e36538da1a45: Optimize Rectangular Gauss Mask
R37:55067015ca49: Include Gaussian Rectangular in FreeStrokeBenchMark
R37:a9b6c3a4eb36: ref T8734
R37:3d77a710a01d: Vectorized Circular Soft Mask Generator
R37:db1ebe824c92: KisBrushMaskSimilarityTest:
R37:d67ecdd905ad: Adhere code to coding style more strictly.
R37:461af3f9a957: Minor code clean up Set a bigger size for generated mask rect
R37:9963768b392b: Correctly compare images alpha channel by setting fuzzy alpha and tolerance…
R37:08efed86d2bb: Added CircleGauss to SimilarityTest
R37:37effe636a30: ADD: Vectorized CircularGaussMask, UnitTestPAssing
R37:b9a538805637: NEW: KisMaskGeneratorBenchmark
R37:dfae36961a09: NEW: Implement Vectorized Soft Brush Mask Generator.
R37:e8de81d0db26: - Soft Circular vectorized brush mask Add missing antialias modification for
R37:884dcc104e3a: FIX: Missing Antialias on Vectorized Circular Gauss
R37:df4cb29add28: FIX: Missing Antialias on Vectorized Circular Gauss
R37:daac6985670c: FIX: Missing Antialias on Vectorized Circular Gauss
R37:8dc950e705ee: FIX: Gauss Circular Mask Antialiasing
R37:b55ed74ac98b: FIX: Gauss Circular Mask Antialiasing
R37:45cf521214b5: FIX: Float precision bug masking issues for vectorized GaussMask generator
R37:b395b05ef54d: FIX: Float precision bug masking issues for vectorized GaussMask generator
R37:8fa826838aa9: Adjust similarity Tolerance.
R37:ae2f0e5cdaa1: Adjust format and on CircSoft Mask FastRow
R37:5f60267ccd80: Modify similiarity test to try more mask variations
R37:f6182887b9b5: Modify maksBenchmark to create identical Soft Masks

Related Objects

vanyossi created this task.Apr 25 2018, 7:01 AM
woltherav added a subscriber: woltherav.

Adding a project to this :)

vanyossi added a revision: Restricted Differential Revision.May 23 2018, 2:22 AM

Made a benchfark for the generation of the mask, I let both scalar and vectorized coexsist to check how much different they are between them (although this was an after thought, af first only vector mask were in).

Preversion of KisMaskGeneratorBenchmark

Some preliminary results in my computer:

./KisMaskGeneratorBenchmark -iterations 300

  • Start testing of KisMaskGeneratorBenchmark *****

Config: Using QtTest library 5.10.0, Qt 5.10.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by Clang 9.0.0 (clang-900.0.39.2) (Apple))
PASS : KisMaskGeneratorBenchmark::initTestCase()
QDEBUG : KisMaskGeneratorBenchmark::testDefaultScalarMask() Invalid profile : "/Library/ColorSync/Profiles//WebSafeColors.icc" "Web Safe Colors"
PASS : KisMaskGeneratorBenchmark::testDefaultScalarMask()
RESULT : KisMaskGeneratorBenchmark::testDefaultScalarMask():

24.27 msecs per iteration (total: 7,282, iterations: 300)

PASS : KisMaskGeneratorBenchmark::testDefaultVectorMask()
RESULT : KisMaskGeneratorBenchmark::testDefaultVectorMask():

1.52 msecs per iteration (total: 458, iterations: 300)

PASS : KisMaskGeneratorBenchmark::testCircularGaussScalarMask()
RESULT : KisMaskGeneratorBenchmark::testCircularGaussScalarMask():

68.450 msecs per iteration (total: 20,535, iterations: 300)

PASS : KisMaskGeneratorBenchmark::testCircularGaussVectorMask()
RESULT : KisMaskGeneratorBenchmark::testCircularGaussVectorMask():

6.253 msecs per iteration (total: 1,876, iterations: 300)

PASS : KisMaskGeneratorBenchmark::cleanupTestCase()
Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted, 62320ms

  • Finished testing of KisMaskGeneratorBenchmark *****

Hi, @vanyossi!

The benchmark results look good. It is a bit suspicious is that for Default tip speed benefit is 16x and for Gaussian tip it is only 10x. It might be a natural difference, but I would probably try to spend a small bit of time (not more than a couple of hours) to check why this difference happens. Again, it might be perfectly okay that this difference exists, but I would check it, just out of curiosity :)

dkazakov updated the task description. (Show Details)Jul 23 2018, 12:14 PM
vanyossi updated the task description. (Show Details)Jul 23 2018, 12:17 PM
vanyossi updated the task description. (Show Details)Jul 30 2018, 1:23 PM
vanyossi updated the task description. (Show Details)Aug 6 2018, 12:47 PM
vanyossi added commits: R37:a9b6c3a4eb36: ref T8734, R37:55067015ca49: Include Gaussian Rectangular in FreeStrokeBenchMark, R37:e36538da1a45: Optimize Rectangular Gauss Mask, R37:76e1022edab7: Use vector vOne floats instead of constant 1.0f, R37:7d08e5e4d2e6: Revert "Optimize Rectangular Gauss Mask", R37:d912b47ef9f2: Fixed: Optimize Rectangular Gaussian Mask, R37:2e69cb1c4e28: Refactor Vc erf operator, R37:ef2d20710194: Vectorized Circular Soft Mask Generator, R37:04e9a46aab82: Refactor vectorized antialising Fade Maker, R37:a5a30036cfbe: Refactor vectorized 2D FadeMaker, R37:64dbc99448e4: Fix regression over softness on Vector Gauss version, R37:f77e393ca079: Fix regression on masks Soft and Gauss Rect for smaller sizes, R37:f3efbf17c4c5: Fix minor code-style, R37:067e7f5df57f: Check for nan results on alphafactor, R37:66fad84c879f: Reduce repetition in exhaustive mask similarity tester, R37:65adda38f081: Add missing license header, R37:2565a74bdc9c: Vectorize Soft Rect Mask Generator, R37:b8f2080917a3: Reduce casting on Vector Indexes creation, R37:21afc59cea14: Use Vc Indexes instead of custom SimdArray for integer casting, R37:0a7d9b113791: Add Default Rectangular to FreehandStrokeBenchmark, R37:728cec98fba8: Adjust Spacing of auto_soft_rect.kpp test preset to be 0.1, R37:da249312e649: Implement Vc Optimization for Default Rect Mask, R37:ff83ebb73821: Reduce difference gap of Default Rect Mask Vector impl, R37:d4aaf03c18dd: Remove unused code from similarity test, R37:0b94a5f0ba25: Small refactor of mask similarity test to avoid repetition.Aug 10 2018, 6:19 PM