Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15138

A cluster (RF=3) not recovering after two nodes are stopped



    • Bug
    • Status: Triage Needed
    • Normal
    • Resolution: Unresolved
    • None
    • Cluster/Membership
    • None
    • User Report
    • All
    • None


      I faced a weird issue when recovering a cluster after two nodes are stopped.
      It is easily reproduce-able and looks like a bug or an issue to fix.
      The following is a step to reproduce it.

      === STEP TO REPRODUCE ===

      • Create a 3-node cluster with RF=3
           - node1(seed), node2, node3
      • Start requests to the cluster with cassandra-stress (it continues
        until the end)
           - what we did: cassandra-stress mixed cl=QUORUM duration=10m
        -errors ignore -node node1,node2,node3 -rate threads\>=16
      • (It doesn't have to be this many threads. Can be 1)
      • Stop node3 normally (with systemctl stop or kill (without -9))
           - the system is still available as expected because the quorum of nodes is
        still available
      • Stop node2 normally (with systemctl stop or kill (without -9))
           - the system is NOT available as expected after it's stopped.
           - the client gets `UnavailableException: Not enough replicas
        available for query at consistency QUORUM`
           - the client gets errors right away (so few ms)
           - so far it's all expected
      • Wait for 1 mins
      • Bring up node2 back
           - The issue happens here.
           - the client gets ReadTimeoutException` or WriteTimeoutException
        depending on if the request is read or write even after the node2 is
           - the client gets errors after about 5000ms or 2000ms, which are
        request timeout for write and read request
           - what node1 reports with `nodetool status` and what node2 reports
        are not consistent. (node2 thinks node1 is down)
           - It takes very long time to recover from its state

      === STEPS TO REPRODUCE ===

      Some additional important information to note:

      • If we don't start cassandra-stress, it doesn't cause the issue.
      • Restarting node1 and it recovers its state right after it's restarted
      • Setting lower value in dynamic_snitch_reset_interval_in_ms (to 60000
        or something) fixes the issue
      • If we `kill -9` the nodes, then it doesn't cause the issue.
      • Hints seems not related. I tested with hints disabled, it didn't make any difference.





            Unassigned Unassigned
            feeblefakie Hiroyuki Yamada
            0 Vote for this issue
            7 Start watching this issue