Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-487

Message Serializer slows down/stops responding

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.4
    • None
    • None
    • Normal

    Description

      We ran into an issue with where the MESSAGE-SERIALIZER-POOL piles up with tasks.
      $ /usr/sbin/nodeprobe -host localhost tpstats
      FILEUTILS-DELETE-POOL, pending tasks=0
      MESSAGING-SERVICE-POOL, pending tasks=0
      MESSAGE-SERIALIZER-POOL, pending tasks=10785714
      RESPONSE-STAGE, pending tasks=0
      BOOT-STRAPPER, pending tasks=0
      ROW-READ-STAGE, pending tasks=0
      MESSAGE-DESERIALIZER-POOL, pending tasks=0
      GMFD, pending tasks=0
      LB-TARGET, pending tasks=0
      CONSISTENCY-MANAGER, pending tasks=0
      ROW-MUTATION-STAGE, pending tasks=0
      MESSAGE-STREAMING-POOL, pending tasks=0
      LOAD-BALANCER-STAGE, pending tasks=0
      MEMTABLE-FLUSHER-POOL, pending tasks=0

      In the log, this seems to have happened when we stopped 2 of the other nodes in our cluster. This node will time out on any thrift requests. Looking through the logs we found the following two exceptions:
      java.util.ConcurrentModificationException
      at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
      at java.util.AbstractList$Itr.next(AbstractList.java:349)
      at java.util.Collections.sort(Collections.java:120)
      at org.apache.cassandra.net.TcpConnectionManager.getLeastLoaded(TcpConnectionManager.java:108)
      at org.apache.cassandra.net.TcpConnectionManager.getConnection(TcpConnectionManager.java:71)
      at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:306)
      at org.apache.cassandra.net.MessageSerializationTask.run(MessageSerializationTask.java:66)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      java.util.NoSuchElementException
      at java.util.AbstractList$Itr.next(AbstractList.java:350)
      at java.util.Collections.sort(Collections.java:120)
      at org.apache.cassandra.net.TcpConnectionManager.getLeastLoaded(TcpConnectionManager.java:108)
      at org.apache.cassandra.net.TcpConnectionManager.getConnection(TcpConnectionManager.java:71)
      at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:306)
      at org.apache.cassandra.net.MessageSerializationTask.run(MessageSerializationTask.java:66)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      This appears to have happened on all 4 MESSAGE-SERIALIZER-POOL threads
      I will attach the complete log.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sammy.yu Sammy Yu Assign to me
            sammy.yu Sammy Yu
            Sammy Yu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment