Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19215

"Query start time" in native transport request threads should be the task enqueue time

    XMLWordPrintableJSON

Details

    • Correctness
    • Normal
    • Normal
    • User Report
    • All
    • None
    • Hide

      Includes tests

      Show
      Includes tests

    Description

      Recently, our Cassandra 4.0.6 cluster experienced an outage due to a surge in expensive traffic from the application side. This surge involved a large volume of costly read queries, which took a considerable amount of time to process on the server side. The client had timeout settings; if a request timed out, it might trigger the sending of new requests. Since the server nodes were overloaded, numerous nodes had hundreds of thousands of tasks queued in the Native-Transport-Request pending queue. I expected that once the application ceased sending requests, the server node would quickly return to normal, as most requests in the queue were over half an hour old and should have timed out rapidly, clearing the queue. However, it actually took an hour to clear the native transport's pending queue, even with native transport disabled. Upon examining the code, I noticed that for read/write requests, the queryStartNanoTime, which determines if a request has timed out, only begins when the task starts processing. This means that no matter how long a request has been pending, it doesn't contribute to the timeout. I believe this is incorrect. The timer should start when the Cassandra server receives the request or when it enqueues the task, not when the request/task begins processing. This way, an overloaded node with many pending tasks can quickly discard timed-out requests and recover from an outage once new requests stop.

      Attachments

        1. ci_summary.html
          7 kB
          Alex Petrov
        2. result_details.tar.gz
          39.91 MB
          Alex Petrov

        Issue Links

          Activity

            People

              ifesdjeen Alex Petrov
              curlylrt Runtian Liu
              Alex Petrov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: