Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19221

CMS: Nodes can restart with new ipaddress already defined in the cluster

    XMLWordPrintableJSON

Details

    Description

      I am simulating running a cluster in Kubernetes and testing what happens when several pods go down and  ip addresses are swapped between nodes. In 4.0 this is blocked and the node cannot be restarted.

      To simulate this I create a 3 node cluster on a local machine using 3 loopback addresses

      127.0.0.1
      127.0.0.2
      127.0.0.3
      

      The nodes are created correctly and the first node is assigned as a CMS node as shown:

      bin/nodetool -p 7199 describecms
      

      Cluster Metadata Service:

      Members: /127.0.0.1:7000
      Is Member: true
      Service State: LOCAL
      

      At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip addresses for the rpc_address and listen_address 
       
      The nodes come back as normal, but the nodeid has now been swapped against the ip address:

      Before:

      Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
      UN  127.0.0.3  75.2 KiB   16      76.0%             6d194555-f6eb-41d0-c000-000000000003  rack1
      UN  127.0.0.2  86.77 KiB  16      59.3%             6d194555-f6eb-41d0-c000-000000000002  rack1
      UN  127.0.0.1  80.88 KiB  16      64.7%             6d194555-f6eb-41d0-c000-000000000001  rack1
      

      After:

      Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load        Tokens  Owns (effective)  Host ID                               Rack
      UN  127.0.0.3  149.62 KiB  16      76.0%             6d194555-f6eb-41d0-c000-000000000003  rack1
      UN  127.0.0.2  155.48 KiB  16      59.3%             6d194555-f6eb-41d0-c000-000000000002  rack1
      UN  127.0.0.1  75.74 KiB   16      64.7%             6d194555-f6eb-41d0-c000-000000000001  rack1
      

      On previous tests of this I have created a table with a replication factor of 1, inserted some data before the swap.   After the swap the data on nodes 2 and 3 is now missing. 

      One theory I have is that I am using different port numbers for the different nodes, and I am only swapping the ip addresses and not the port numbers, so the ip:port still looks unique

      i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
      and 127.0.0.3:9044 becomes 127.0.0.3:9043

       

      Attachments

        1. ci_summary.html
          56 kB
          Alex Petrov
        2. ci_summary-1.html
          33 kB
          Alex Petrov

        Issue Links

          Activity

            People

              ifesdjeen Alex Petrov
              paulchandler Paul Chandler
              Alex Petrov
              Sam Tunnicliffe
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: