Added optimized version for alpha darken composite op for RGBF32 colorspace.
Added tests to test performance and results of new implementation against legacy.
The diff needed in the test compare is do to the fact that the compiler calculates 1.0/255.0 and multiplying the result instead if dividing by 255.0 for the mask.
Here are the results of the benchmark on my Intel i5-2520M CPU
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: Testing Composite Op: "alphadarken" ( "Legacy" )
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned Mask SrcRand DstRand" RESULT: 67 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "DstUnalig Mask SrcRand DstRand" RESULT: 68 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "SrcUnalig Mask SrcRand DstRand" RESULT: 69 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Unaligned Mask SrcRand DstRand" RESULT: 66 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcRand DstRand" RESULT: 33 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcZero DstRand" RESULT: 32 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcUnit DstRand" RESULT: 32 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcRand DstZero" RESULT: 31 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcZero DstZero" RESULT: 28 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcUnit DstZero" RESULT: 31 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcRand DstUnit" RESULT: 32 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcZero DstUnit" RESULT: 32 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenLegacy() krita.general: "Aligned NoMask SrcUnit DstUnit" RESULT: 33 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned Mask SrcRand DstRand" RESULT: 12 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "DstUnalig Mask SrcRand DstRand" RESULT: 12 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "SrcUnalig Mask SrcRand DstRand" RESULT: 16 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Unaligned Mask SrcRand DstRand" RESULT: 16 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcRand DstRand" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcZero DstRand" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcUnit DstRand" RESULT: 14 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcRand DstZero" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcZero DstZero" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcUnit DstZero" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcRand DstUnit" RESULT: 9 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcZero DstUnit" RESULT: 10 msec
QDEBUG : KisCompositionBenchmark::testRgbF32CompositeAlphaDarkenOptimized() krita.general: "Aligned NoMask SrcUnit DstUnit" RESULT: 9 msec
This is a speedup of factor 3 to 6.