Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2072

Race condition during decommission

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 0.7.1
    • None
    • None
    • Low

    Description

      Occasionally when decommissioning a node, there is a race condition that occurs where another node will never remove the token and thus propagate it again with a state of down. With CASSANDRA-1900 we can solve this, but it shouldn't occur in the first place.

      Given nodes A, B, and C, if you decommission B it will stream to A and C. When complete, B will decommission and receive this stacktrace:

      ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
      java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
      at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
      at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
      at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
      at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
      at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91

      At this point A will show it is removing B's token, but C will not and instead its failure detector will report that B is dead, and nodetool ring on C shows B in a leaving/down state. In another gossip round, C will propagate this state back to A.

      Attachments

        Activity

          People

            brandon.williams Brandon Williams
            brandon.williams Brandon Williams
            Brandon Williams
            Gary Dusbabek
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: