It seems like in current implementations of libc, they don't implement the optimization where signaling a condition variable just moves the signaled thread to the top of the lock's wait queue. Instead, it actually wakes the thread, which will then immediately just proceed to try to take the lock, and block again, causing extra context switches.
https://sourceware.org/ml/libc-alpha/2005-03/msg00228.html noted this behavior but it doesn't seem like anything got done, best I can tell. Warrants some benchmarking and investigation. Also worth noting that Impala does the signal outside the lock.