Add CMake flag to force -O3 for RelWithDebInfo
Changes PlannedPublic

Authored by alvinhochun on Feb 11 2018, 6:23 PM.

Details

Reviewers
None
Group Reviewers
Krita
Summary

The build type RelWithDebInfo defaults to -O2. The Windows nightly/release builds uses this configuration.

Can anyone tell if it's worth compiling Krita itself with -O3, runtime-performance-wise? (I can see the stack traces getting even less useful when not having the debug symbols...)

Diff Detail

Repository
R37 Krita
Lint
Lint Skipped
Unit
Unit Tests Skipped
alvinhochun created this revision.Feb 11 2018, 6:23 PM
Restricted Application added a subscriber: woltherav. ยท View Herald TranscriptFeb 11 2018, 6:23 PM
alvinhochun requested review of this revision.Feb 11 2018, 6:23 PM

Hi, @alvinhochun!

You can try to run krita-ui-FreehandStrokeBenchmark with the option enabled or not and see the difference. This benchmark checks the most busy part of Krita code: painting with a brush.

Hi, @alvinhochun!

You can try to run krita-ui-FreehandStrokeBenchmark with the option enabled or not and see the difference. This benchmark checks the most busy part of Krita code: painting with a brush.

OK, so I tried the krita-ui-FreehandStrokeBenchmark benchmark once for each config, just a very crude test, I didn't check if the CPU is thermally throttling at all (it's a laptop):

-O3:

********* Start testing of FreehandStrokeBenchmark *********
Config: Using QtTest library 5.9.3, Qt 5.9.3 (x86_64-little_endian-llp64 shared (dynamic) release build; by GCC 7.1.0)
PASS   : FreehandStrokeBenchmark::initTestCase()
QWARN  : FreehandStrokeBenchmark::testDefaultTip() QObject: Cannot create children for a parent that is in a different thread.
(Parent is QApplication(0x22fe10), parent's thread is QThread(0x25051f0), current thread is QThread(0x190263c0)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 1 Time: 5841 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 2 Time: 3365 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 3 Time: 3159 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 4 Time: 2961 (ms)
PASS   : FreehandStrokeBenchmark::testDefaultTip()
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 1 Time: 25810 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 2 Time: 15298 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 3 Time: 13102 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 4 Time: 12385 (ms)
PASS   : FreehandStrokeBenchmark::testSoftTip()
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 1 Time: 66223 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 2 Time: 38071 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 3 Time: 31729 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 4 Time: 29206 (ms)
PASS   : FreehandStrokeBenchmark::testGaussianTip()
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 1 Time: 23681 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 2 Time: 13911 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 3 Time: 11728 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 4 Time: 10666 (ms)
PASS   : FreehandStrokeBenchmark::testStampTip()
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 1 Time: 9893 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 2 Time: 9245 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 3 Time: 9342 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 4 Time: 9340 (ms)
PASS   : FreehandStrokeBenchmark::testColorsmudgeDefaultTip()
PASS   : FreehandStrokeBenchmark::cleanupTestCase()
Totals: 7 passed, 0 failed, 0 skipped, 0 blacklisted, 355226ms
********* Finished testing of FreehandStrokeBenchmark *********

-O2:

********* Start testing of FreehandStrokeBenchmark *********
Config: Using QtTest library 5.9.3, Qt 5.9.3 (x86_64-little_endian-llp64 shared (dynamic) release build; by GCC 7.1.0)
PASS   : FreehandStrokeBenchmark::initTestCase()
QWARN  : FreehandStrokeBenchmark::testDefaultTip() QObject: Cannot create children for a parent that is in a different thread.
(Parent is QApplication(0x22fe10), parent's thread is QThread(0x24a51f0), current thread is QThread(0x18ee67e0)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 1 Time: 6859 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 2 Time: 3683 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 3 Time: 3370 (ms)
QDEBUG : FreehandStrokeBenchmark::testDefaultTip() Cores: 4 Time: 3238 (ms)
PASS   : FreehandStrokeBenchmark::testDefaultTip()
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 1 Time: 23287 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 2 Time: 13764 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 3 Time: 11776 (ms)
QDEBUG : FreehandStrokeBenchmark::testSoftTip() Cores: 4 Time: 11211 (ms)
PASS   : FreehandStrokeBenchmark::testSoftTip()
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 1 Time: 64064 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 2 Time: 36705 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 3 Time: 30153 (ms)
QDEBUG : FreehandStrokeBenchmark::testGaussianTip() Cores: 4 Time: 28411 (ms)
PASS   : FreehandStrokeBenchmark::testGaussianTip()
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 1 Time: 23739 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 2 Time: 13529 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 3 Time: 11851 (ms)
QDEBUG : FreehandStrokeBenchmark::testStampTip() Cores: 4 Time: 11121 (ms)
PASS   : FreehandStrokeBenchmark::testStampTip()
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 1 Time: 10039 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 2 Time: 9514 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 3 Time: 9519 (ms)
QDEBUG : FreehandStrokeBenchmark::testColorsmudgeDefaultTip() Cores: 4 Time: 9599 (ms)
PASS   : FreehandStrokeBenchmark::testColorsmudgeDefaultTip()
PASS   : FreehandStrokeBenchmark::cleanupTestCase()
Totals: 7 passed, 0 failed, 0 skipped, 0 blacklisted, 354695ms
********* Finished testing of FreehandStrokeBenchmark *********

Doesn't seem very exciting.

Hi, @alvinhochun!

15% difference seem to be nice though. It might be a good idea to make this option disabled by default and activate for release builds. What do you think?

Hi, @alvinhochun!

15% difference seem to be nice though. It might be a good idea to make this option disabled by default and activate for release builds. What do you think?

There is always the risk of (more) compiler bugs, but it seems pretty low nowadays and it is used in some production software releases (even Qt is compiled with -O3).
Another risk is the new optimizations breaking existing code that (unknowingly) depends on undefined behaviour. They would technically be considered bugs in the Krita codebase that need fixing anyway, but this means potentially more undiscovered bugs in the release builds.

But upon further research, there are claims that it could be a bad idea to blindly enable -O3 for the whole project because it may generate code that runs slower in practice: for example larger code which in turn causes worse performance due to cache misses. It seems to be better to selectively enable -O3 and/or certain optimizations for individual targets or source files that benefits from the extra optimizations. If we want to go down this direction... perhaps we should open a task.

alvinhochun planned changes to this revision.Feb 28 2018, 1:31 PM