Optimize UrlParseLock to remove excessive sleep times.
d9313010b1be
Actions

Authored by mwolff on Aug 17 2016, 6:26 PM.

Description

Optimize UrlParseLock to remove excessive sleep times.

Running duchainify on the heaptrack sources on a nproc=8 machine
now gives me the following results:

Performance counter stats for 'duchainify -t 8 .' (5 runs):

  23230.558005      task-clock (msec)         #    3.369 CPUs utilized            ( +-  1.88% )
        60,838      context-switches          #    0.003 M/sec                    ( +-  7.23% )
         1,380      cpu-migrations            #    0.059 K/sec                    ( +- 19.25% )
       206,106      page-faults               #    0.009 M/sec                    ( +-  3.04% )
86,087,979,829      cycles                    #    3.706 GHz                      ( +-  1.82% )
70,508,334,605      instructions              #    0.82  insn per cycle           ( +-  1.13% )
15,187,539,592      branches                  #  653.774 M/sec                    ( +-  1.07% )
   283,447,232      branch-misses             #    1.87% of all branches          ( +-  1.31% )

   6.896230441 seconds time elapsed                                          ( +-  6.05% )

Before, the result was:

  23720.891477      task-clock (msec)         #    2.979 CPUs utilized            ( +-  0.46% )
        32,629      context-switches          #    0.001 M/sec                    ( +-  7.98% )
           997      cpu-migrations            #    0.042 K/sec                    ( +- 11.20% )
       198,436      page-faults               #    0.008 M/sec                    ( +-  2.10% )
87,645,125,683      cycles                    #    3.695 GHz                      ( +-  0.45% )
67,272,691,473      instructions              #    0.77  insn per cycle           ( +-  0.98% )
14,515,423,390      branches                  #  611.926 M/sec                    ( +-  0.97% )
   256,262,860      branch-misses             #    1.77% of all branches          ( +-  0.46% )

   7.962761391 seconds time elapsed                                          ( +-  2.39% )

Note that the previous implementation was mostly bad because of
excessive 1s sleeping. The new implementation relies on per-url
mutices to sleep only as long as needed.

We really could benefit from work stealing in our parse job queue...
But in C/C++ projects, the issue comes from central headers that
are included in nearly all files and thus easily trigger sleeps
here.

The code was reviewed by David Faure, thanks!