Description
I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails after hitting the TSAN thread limit, after seemingly being stuck for 10 minutes or so. The end of the logs look like
W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000c2ba0 after lost signal to thread 8435 W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure 0x7b080019f2a0 after lost signal to thread 10185 W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure 0x7b080018f060 after lost signal to thread 10361 W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08000fc360 after lost signal to thread 8435 W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08000ccf20 after lost signal to thread 10185 W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure 0x7b0800051ae0 after lost signal to thread 10361 W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000f9280 after lost signal to thread 8435 W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08001b0e40 after lost signal to thread 10185 W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000b7460 after lost signal to thread 10361 W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure 0x7b0800044be0 after lost signal to thread 8435 W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure 0x7b080018f3c0 after lost signal to thread 10185 W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08001b3480 after lost signal to thread 10361 ==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying.
Some threads are unresponsive, even to the signals sent by the stack trace collector thread. Unfortunately, there's nothing in the logs about those threads.