Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19082

Histogram overflow causes client timeouts and message drops

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None
    • All
    • None

    Description

      Hi,

      We have recently noticed that sometimes this exception happens on our Cassandra cluster: 

      ERROR [ScheduledTasks:1] 2023-11-24 06:24:12,680 CassandraDaemon.java:244 - Exception in thread Thread[ScheduledTasks:1,5,main]
      java.lang.IllegalStateException: Unable to compute when histogram overflowed
              at org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.getMean(DecayingEstimatedHistogramReservoir.java:472)
              at org.apache.cassandra.net.MessagingService.getDroppedMessagesLogs(MessagingService.java:1272)
              at org.apache.cassandra.net.MessagingService.logDroppedMessages(MessagingService.java:1244)
              at org.apache.cassandra.net.MessagingService.access$200(MessagingService.java:84)
              at org.apache.cassandra.net.MessagingService$4.run(MessagingService.java:512)
              at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
              at java.lang.Thread.run(Thread.java:750)
       

      It happens on all 6 nodes at the same time. Also we see increased client timeouts and dropped READ and READ_RESPONSE messages. Our Cassandra is 3.11.16, 2 DC setup, 6 node in each DC. RF is 3. I have searched issues but could not find exactly same issue causing messages to be dropped. Any suggestion would be appreciated. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kolargol Zbyszek Z
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: