Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-1467

replication factor exceeds number of endpoints, when attempting to join a new node (but cluster has enough running nodes to fulfill RF)

    XMLWordPrintableJSON

Details

    • Normal

    Description

      What happens here the following:

      • given a healthy running cluster of 2 nodes (it used to be 3 but I manually killed one down)
      • with a Keyspace having a ReplicationFactor of 2
      • (this cluster is operational, loaded with data and working well)

      as soon as I want to bring up a new 3rd node:

      • the node is detected by the current cluster
      • but as soon as it tries to initiate bootstrap sequence it dies with: replication factor (2) exceeds number of endpoints (1)

      I believe this is similar to #1343, but not quite the same, since the keyspace and everything is already created, and I'm not attempting any modification. This is purely bringing a new node up.

      One extra tidbit of information, in case it's important...maybe it is not:
      I have only set 1 seed node configured (this is a test setup). And when the new node starts coming up (before the crash) its nodetool ring only reports the seed one.

      From the new node coming up (before it crashes):
      root@node1-3:~# nodetool -h localhost -p 8080 ring
      Address Status State Load Token
      10.250.106.111 Up Normal 116.3 GB 0 < = this the only configured seed node

      From one of the other running nodes (this is the real status of the cluster):
      root@node1-2:~# nodetool -h localhost -p 8080 ring
      Address Status State Load Token
      113416112894748789872342756657008344878
      10.250.106.111 Up Normal 116.3 GB 0
      10.215.195.81 Up Normal 116.31 GB 56713727820156410577229101238628035242
      10.246.65.221 Up Joining 7.63 KB 113416112894748789872342756657008344878

      Here's the full boot trace of the node is trying to join:
      Java HotSpot(TM) Client VM warning: Can't detect initial thread stack location - find_vma failed
      Create RMI registry on port 8081
      Get the platform's MBean server
      Initialize the environment map
      Create an RMI connector server
      Start the RMI connector server on port 8081
      INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml
      DEBUG 22:32:50,537 Syncing log with a period of 10000
      INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard
      DEBUG 22:32:50,548 setting auto_bootstrap to true
      DEBUG 22:32:50,702 Starting CFS Statistics
      DEBUG 22:32:50,710 key cache capacity for Statistics is 1
      DEBUG 22:32:50,711 Starting CFS Schema
      DEBUG 22:32:50,712 key cache capacity for Schema is 1
      DEBUG 22:32:50,712 Starting CFS Migrations
      DEBUG 22:32:50,713 key cache capacity for Migrations is 1
      DEBUG 22:32:50,713 Starting CFS LocationInfo
      DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1
      DEBUG 22:32:50,714 Starting CFS HintsColumnFamily
      DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1
      INFO 22:32:50,736 Couldn't detect any schema definitions in local storage.
      INFO 22:32:50,737 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
      DEBUG 22:32:50,738 opening keyspace system
      INFO 22:32:50,757 Cassandra version:
      INFO 22:32:50,757 Thrift API version: 10.0.0
      INFO 22:32:50,758 Saved Token not found. Using 113416112894748789872342756657008344878
      INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING
      INFO 22:32:50,763 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553170763.log
      DEBUG 22:32:50,767 Estimating compactions for LocationInfo
      DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily
      DEBUG 22:32:50,768 Estimating compactions for Migrations
      DEBUG 22:32:50,768 Estimating compactions for Schema
      DEBUG 22:32:50,768 Estimating compactions for Statistics
      DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be useful
      DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would be useful
      DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful
      DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful
      DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful
      INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276)
      INFO 22:32:50,782 Enqueuing flush of Memtable-LocationInfo@18721294(192 bytes, 4 operations)
      INFO 22:32:50,783 Writing Memtable-LocationInfo@18721294(192 bytes, 4 operations)
      INFO 22:32:50,897 Completed flushing /mnt/ebs/data/system/LocationInfo-e-1-Data.db
      DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be useful
      DEBUG 22:32:50,898 Discarding 0
      DEBUG 22:32:50,899 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276), column family 0.
      DEBUG 22:32:50,899 Marking replay position 276 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
      INFO 22:32:50,908 Starting up server gossip
      INFO 22:32:50,928 Joining: getting load information
      INFO 22:32:50,928 Sleeping 90000 ms to wait for load information...
      DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111
      DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111
      DEBUG 22:32:51,045 attempting to connect to /10.215.195.81
      INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster
      DEBUG 22:32:51,051 Resetting pool for /10.215.195.81
      DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed manually (endpoint was unknown)
      INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster
      DEBUG 22:32:51,053 Resetting pool for /10.250.106.111
      DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0
      DEBUG 22:32:51,053 clearing cached endpoints
      DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111
      DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8
      INFO 22:32:51,189 switching in a fresh Memtable for Migrations at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
      INFO 22:32:51,189 Enqueuing flush of Memtable-Migrations@18093512(4938 bytes, 1 operations)
      INFO 22:32:51,189 Writing Memtable-Migrations@18093512(4938 bytes, 1 operations)
      INFO 22:32:51,194 switching in a fresh Memtable for Schema at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
      INFO 22:32:51,194 Enqueuing flush of Memtable-Schema@27402470(1784 bytes, 3 operations)
      INFO 22:32:51,274 Completed flushing /mnt/ebs/data/system/Migrations-e-1-Data.db
      DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful
      DEBUG 22:32:51,275 Discarding 2
      DEBUG 22:32:51,275 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 2
      .
      DEBUG 22:32:51,275 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
      INFO 22:32:51,275 Writing Memtable-Schema@27402470(1784 bytes, 3 operations)
      INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db
      DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful
      DEBUG 22:32:51,389 Discarding 3
      DEBUG 22:32:51,389 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 3
      .
      DEBUG 22:32:51,389 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
      DEBUG 22:32:51,406 Starting CFS MyColumnFamily
      DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000
      INFO 22:32:51,407 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553171407.log
      DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111
      INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP
      INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP
      INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81
      INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint /10.215.195.81
      INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111
      INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint /10.250.106.111
      DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 used; max is 902627328
      DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed leaving 8520856 used; max is 902627328
      DEBUG 22:32:52,924 Disseminating load info ...
      DEBUG 22:32:53,929 attempting to connect to /10.215.195.81
      DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed leaving 15214608 used; max is 902627328
      DEBUG 22:33:52,931 Disseminating load info ...
      DEBUG 22:34:20,929 ... got load info
      INFO 22:34:20,929 Joining: getting bootstrap token
      DEBUG 22:34:20,931 token manually specified as 113416112894748789872342756657008344878
      INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup
      INFO 22:34:50,935 Bootstrapping
      DEBUG 22:34:50,935 Beginning bootstrap process
      java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
      Caused by: java.lang.IllegalStateException: replication factor (2) exceeds number of endpoints (1)
      at org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57)
      at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195)
      at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155)
      at org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73)
      at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467)
      at org.apache.cassandra.service.StorageService.initServer(StorageService.java:408)
      at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134)
      at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57)
      ... 5 more
      Cannot load daemon
      Service exit with a return value of 3

      Attachments

        Activity

          People

            gdusbabek Gary Dusbabek
            blanquer Josep M. Blanquer
            Gary Dusbabek
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: