Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
I'm using 0.7 beta1. Built from: http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.0/apache-cassandra-0.7.0-beta1-src.tar.gz using dpkg-buildpackage from it.
-
Normal
Description
What happens here the following:
- given a healthy running cluster of 2 nodes (it used to be 3 but I manually killed one down)
- with a Keyspace having a ReplicationFactor of 2
- (this cluster is operational, loaded with data and working well)
as soon as I want to bring up a new 3rd node:
- the node is detected by the current cluster
- but as soon as it tries to initiate bootstrap sequence it dies with: replication factor (2) exceeds number of endpoints (1)
I believe this is similar to #1343, but not quite the same, since the keyspace and everything is already created, and I'm not attempting any modification. This is purely bringing a new node up.
One extra tidbit of information, in case it's important...maybe it is not:
I have only set 1 seed node configured (this is a test setup). And when the new node starts coming up (before the crash) its nodetool ring only reports the seed one.
From the new node coming up (before it crashes):
root@node1-3:~# nodetool -h localhost -p 8080 ring
Address Status State Load Token
10.250.106.111 Up Normal 116.3 GB 0 < = this the only configured seed node
From one of the other running nodes (this is the real status of the cluster):
root@node1-2:~# nodetool -h localhost -p 8080 ring
Address Status State Load Token
113416112894748789872342756657008344878
10.250.106.111 Up Normal 116.3 GB 0
10.215.195.81 Up Normal 116.31 GB 56713727820156410577229101238628035242
10.246.65.221 Up Joining 7.63 KB 113416112894748789872342756657008344878
Here's the full boot trace of the node is trying to join:
Java HotSpot(TM) Client VM warning: Can't detect initial thread stack location - find_vma failed
Create RMI registry on port 8081
Get the platform's MBean server
Initialize the environment map
Create an RMI connector server
Start the RMI connector server on port 8081
INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml
DEBUG 22:32:50,537 Syncing log with a period of 10000
INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard
DEBUG 22:32:50,548 setting auto_bootstrap to true
DEBUG 22:32:50,702 Starting CFS Statistics
DEBUG 22:32:50,710 key cache capacity for Statistics is 1
DEBUG 22:32:50,711 Starting CFS Schema
DEBUG 22:32:50,712 key cache capacity for Schema is 1
DEBUG 22:32:50,712 Starting CFS Migrations
DEBUG 22:32:50,713 key cache capacity for Migrations is 1
DEBUG 22:32:50,713 Starting CFS LocationInfo
DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1
DEBUG 22:32:50,714 Starting CFS HintsColumnFamily
DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1
INFO 22:32:50,736 Couldn't detect any schema definitions in local storage.
INFO 22:32:50,737 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
DEBUG 22:32:50,738 opening keyspace system
INFO 22:32:50,757 Cassandra version:
INFO 22:32:50,757 Thrift API version: 10.0.0
INFO 22:32:50,758 Saved Token not found. Using 113416112894748789872342756657008344878
INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING
INFO 22:32:50,763 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553170763.log
DEBUG 22:32:50,767 Estimating compactions for LocationInfo
DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily
DEBUG 22:32:50,768 Estimating compactions for Migrations
DEBUG 22:32:50,768 Estimating compactions for Schema
DEBUG 22:32:50,768 Estimating compactions for Statistics
DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be useful
DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful
DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful
INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276)
INFO 22:32:50,782 Enqueuing flush of Memtable-LocationInfo@18721294(192 bytes, 4 operations)
INFO 22:32:50,783 Writing Memtable-LocationInfo@18721294(192 bytes, 4 operations)
INFO 22:32:50,897 Completed flushing /mnt/ebs/data/system/LocationInfo-e-1-Data.db
DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be useful
DEBUG 22:32:50,898 Discarding 0
DEBUG 22:32:50,899 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=276), column family 0.
DEBUG 22:32:50,899 Marking replay position 276 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
INFO 22:32:50,908 Starting up server gossip
INFO 22:32:50,928 Joining: getting load information
INFO 22:32:50,928 Sleeping 90000 ms to wait for load information...
DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111
DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111
DEBUG 22:32:51,045 attempting to connect to /10.215.195.81
INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster
DEBUG 22:32:51,051 Resetting pool for /10.215.195.81
DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed manually (endpoint was unknown)
INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster
DEBUG 22:32:51,053 Resetting pool for /10.250.106.111
DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0
DEBUG 22:32:51,053 clearing cached endpoints
DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111
DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8
INFO 22:32:51,189 switching in a fresh Memtable for Migrations at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
INFO 22:32:51,189 Enqueuing flush of Memtable-Migrations@18093512(4938 bytes, 1 operations)
INFO 22:32:51,189 Writing Memtable-Migrations@18093512(4938 bytes, 1 operations)
INFO 22:32:51,194 switching in a fresh Memtable for Schema at CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417)
INFO 22:32:51,194 Enqueuing flush of Memtable-Schema@27402470(1784 bytes, 3 operations)
INFO 22:32:51,274 Completed flushing /mnt/ebs/data/system/Migrations-e-1-Data.db
DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful
DEBUG 22:32:51,275 Discarding 2
DEBUG 22:32:51,275 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 2
.
DEBUG 22:32:51,275 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
INFO 22:32:51,275 Writing Memtable-Schema@27402470(1784 bytes, 3 operations)
INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db
DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful
DEBUG 22:32:51,389 Discarding 3
DEBUG 22:32:51,389 discard completed log segments for CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', position=7417), column family 3
.
DEBUG 22:32:51,389 Marking replay position 7417 on commit log CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
DEBUG 22:32:51,406 Starting CFS MyColumnFamily
DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000
INFO 22:32:51,407 Creating new commitlog segment /mnt/cassandra/commitlog/CommitLog-1283553171407.log
DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111
INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP
INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP
INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81
INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint /10.215.195.81
INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111
INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint /10.250.106.111
DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 used; max is 902627328
DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed leaving 8520856 used; max is 902627328
DEBUG 22:32:52,924 Disseminating load info ...
DEBUG 22:32:53,929 attempting to connect to /10.215.195.81
DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed leaving 15214608 used; max is 902627328
DEBUG 22:33:52,931 Disseminating load info ...
DEBUG 22:34:20,929 ... got load info
INFO 22:34:20,929 Joining: getting bootstrap token
DEBUG 22:34:20,931 token manually specified as 113416112894748789872342756657008344878
INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup
INFO 22:34:50,935 Bootstrapping
DEBUG 22:34:50,935 Beginning bootstrap process
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
Caused by: java.lang.IllegalStateException: replication factor (2) exceeds number of endpoints (1)
at org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195)
at org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155)
at org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73)
at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:408)
at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134)
at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57)
... 5 more
Cannot load daemon
Service exit with a return value of 3