Optimized Auto Brush mask filling code
This patch uses internal cpu parallelism and makes the code execute much
faster in the 'KisStrokeBenchmark pixelbrush300pxRL' benchmark.
Actual results in the benchmark:
Sandy Bridge (Core i7-2600): +25%
Merom (Core 2 Duo T7250): +10%
According to VTune the painting should have become up to 10% faster
(on Sandy Bridge), because now this part of code consumes almost no time.
This optimization will work most on the highest precision levels, that is
when a dab cannot be cached.
CCMAIL:kimageshop@kde.org