Always use std::mutex for locking, never a custom spin lock
The previous change which introduces a custom buffered write
now often leads to abysmal performance of heaptrack under extreme
thread pressure. E.g. running heaptrack tests/manual/threaded
suddenly takes up to one minute instead of around one second.
By using a mutex always, the performance is back in the original
ballpark. We can workaround the deadlock that was originally observed
by using std::mutex::try_lock for the background timer thread,
to ensure it's not endlessly locking while we try to join it during
destruction.