Improve BitSet operations performance

Authored by trufanov on Jun 20 2020, 12:38 PM.



I'm suggesting a set of improvements for BitSets implementation .
Current one is using a bit access (get/set) in for loops for bulk operations (like invert()).
New implementation perform bulk operation on bytes with proper mask. Also get/set performance is improved with lookup tables.

Calculations of i / 8 and i % 8 are replaced with equal i >> 3 and i & 7.

BitCount lookup table declaration has been replaced with its macro compacted variant taken from Bit Twiddling Hacks. It's moved to header as now it's used in BitSet::set too.

Note: as BitSet::get/BitSet::set are inlined - you need to recompile KTorrent too to benefit from their implementations

There are no specific tests for BitSet, but it's used in utppolltest at least

Test Plan

measure performance with valgrind profiler

Diff Detail

R472 KTorrent Library
Lint Skipped
Unit Tests Skipped
trufanov requested review of this revision.Jun 20 2020, 12:38 PM
trufanov created this revision.
stikonas accepted this revision.Jun 20 2020, 1:54 PM
stikonas added inline comments.

Is this actually faster? I would have thought these days compilers are smart enough to optimize this. But if valgrind ssays it's faster than ok...

This revision is now accepted and ready to land.Jun 20 2020, 1:54 PM