Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7307

New nodes mark dead nodes as up for 10 minutes

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 1.2.17, 2.0.9, 2.1 rc2
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal
    • Since Version:

      Description

      When doing a node replacement when other nodes are down we see the down nodes marked as up for about 10 minutes. This means requests are routed to the dead nodes causing timeouts. It also means replacing a node when multiple nodes from a replica set is extremely difficult - the node usually tries to stream from a dead node and the replacement fails.

      This isn't limited to host replacement. I did a simple test:

      1. Create a 2 node cluster
      2. Kill node 2
      3. Start a 3rd node with a unique token (I used auto_bootstrap=false but I don't think this is significant)

      The 3rd node lists node 2 (127.0.0.2) as up for almost 10 minutes:

      INFO [main] 2014-05-27 14:28:24,753 CassandraDaemon.java (line 119) Logging initialized
      INFO [GossipStage:1] 2014-05-27 14:28:31,492 Gossiper.java (line 843) Node /127.0.0.2 is now part of the cluster
      INFO [GossipStage:1] 2014-05-27 14:28:31,495 Gossiper.java (line 809) InetAddress /127.0.0.2 is now UP
      INFO [GossipTasks:1] 2014-05-27 14:37:44,526 Gossiper.java (line 823) InetAddress /127.0.0.2 is now DOWN
      

      I reproduced on 1.2.15 and 1.2.16.

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              brandon.williams Brandon Williams Assign to me
              Reporter:
              rlow Richard Low
              Authors:
              Brandon Williams
              Reviewers:
              Jonathan Ellis

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment