Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3578 De-flaking effort
  3. KUDU-3585

ClientTest.ClearCacheAndConcurrentWorkload fails from time to time in TSAN builds

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.14.0, 1.15.0, 1.16.0, 1.17.0
    • 1.18.0
    • client, test
    • None

    Description

      The scenario sometimes fails in TSAN builds with output like cited below.

      It seems the root cause was RPC queue overflows at kudu-master and kudu-tserver: both spend much more time on regular requests when built with TSAN instrumentation, and resetting the client'ss meta-cache too often induces a lot of GetTableLocations requests, and serving eats a lot of CPU and many threads are kept busy. Since an internal mini-cluster is used in the scenario (i.e. all masters and tablet servers are a part of just one process), that affects kudu-tserver RPC worker threads as well, so many requests accumulate in the RPC queues.

      src/kudu/client/client-test.cc:408: Failure
      Expected equality of these values: 0                                                                             
        server->server()->rpc_server()-> service_pool("kudu.tserver.TabletServerService")-> RpcsQueueOverflowMetric()->value()
          Which is: 1
      src/kudu/client/client-test.cc:584: Failure
      Expected: CheckNoRpcOverflow() doesn't generate new fatal failures in the current thread. 
        Actual: it does.                                                              
      src/kudu/client/client-test.cc:2466: Failure
      Expected: DeleteTestRows(client_table_.get(), kLowIdx, kHighIdx) doesn't generate new fatal failures in the current thread.
        Actual: it does.  
      

      Attachments

        1. client-test.5.txt.xz
          5 kB
          Alexey Serbin

        Activity

          People

            aserbin Alexey Serbin
            aserbin Alexey Serbin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: