Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2805

ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.9.0
    • Fix Version/s: NA
    • Component/s: test
    • Labels:
      None

      Description

      I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails after hitting the TSAN thread limit, after seemingly being stuck for 10 minutes or so. The end of the logs look like

      W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000c2ba0 after lost signal to thread 8435
      W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure 0x7b080019f2a0 after lost signal to thread 10185
      W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure 0x7b080018f060 after lost signal to thread 10361
      W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08000fc360 after lost signal to thread 8435
      W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08000ccf20 after lost signal to thread 10185
      W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure 0x7b0800051ae0 after lost signal to thread 10361
      W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000f9280 after lost signal to thread 8435
      W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08001b0e40 after lost signal to thread 10185
      W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure 0x7b08000b7460 after lost signal to thread 10361
      W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure 0x7b0800044be0 after lost signal to thread 8435
      W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure 0x7b080018f3c0 after lost signal to thread 10185
      W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure 0x7b08001b3480 after lost signal to thread 10361
      ==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying.
      

      Some threads are unresponsive, even to the signals sent by the stack trace collector thread. Unfortunately, there's nothing in the logs about those threads.

        Attachments

        1. client-test.tsanlimit.txt
          53 kB
          William Berkeley

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wdberkeley William Berkeley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: