Optimize UrlParseLock to remove excessive sleep times.
Running duchainify on the heaptrack sources on a nproc=8 machine
now gives me the following results:
Performance counter stats for 'duchainify -t 8 .' (5 runs):
23230.558005 task-clock (msec) # 3.369 CPUs utilized ( +- 1.88% ) 60,838 context-switches # 0.003 M/sec ( +- 7.23% ) 1,380 cpu-migrations # 0.059 K/sec ( +- 19.25% ) 206,106 page-faults # 0.009 M/sec ( +- 3.04% ) 86,087,979,829 cycles # 3.706 GHz ( +- 1.82% ) 70,508,334,605 instructions # 0.82 insn per cycle ( +- 1.13% ) 15,187,539,592 branches # 653.774 M/sec ( +- 1.07% ) 283,447,232 branch-misses # 1.87% of all branches ( +- 1.31% ) 6.896230441 seconds time elapsed ( +- 6.05% )
Before, the result was:
23720.891477 task-clock (msec) # 2.979 CPUs utilized ( +- 0.46% ) 32,629 context-switches # 0.001 M/sec ( +- 7.98% ) 997 cpu-migrations # 0.042 K/sec ( +- 11.20% ) 198,436 page-faults # 0.008 M/sec ( +- 2.10% ) 87,645,125,683 cycles # 3.695 GHz ( +- 0.45% ) 67,272,691,473 instructions # 0.77 insn per cycle ( +- 0.98% ) 14,515,423,390 branches # 611.926 M/sec ( +- 0.97% ) 256,262,860 branch-misses # 1.77% of all branches ( +- 0.46% ) 7.962761391 seconds time elapsed ( +- 2.39% )
Note that the previous implementation was mostly bad because of
excessive 1s sleeping. The new implementation relies on per-url
mutices to sleep only as long as needed.
We really could benefit from work stealing in our parse job queue...
But in C/C++ projects, the issue comes from central headers that
are included in nearly all files and thus easily trigger sleeps
here.
The code was reviewed by David Faure, thanks!