Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
Description
This looks like a timeout kind of flap. It's flapped once. Example failure:
Failed on CassCI build cassandra-2.2_offheap_dtest #344 - 2.2.6-tentative
Error Message 15 Apr 2016 16:23:41 [node3] Missing: ['127.0.0.4.* now UP']: INFO [main] 2016-04-15 16:21:32,345 Config.java:4..... See system.log for remainder -------------------- >> begin captured logging << -------------------- dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-4i5qkE dtest: DEBUG: Custom init_config not found. Setting defaults. dtest: DEBUG: Done setting configuration options: { 'initial_token': None, 'memtable_allocation_type': 'offheap_objects', 'num_tokens': '32', 'phi_convict_threshold': 5, 'start_rpc': 'true'} dtest: DEBUG: Starting cluster with 3 nodes. dtest: DEBUG: 32 dtest: DEBUG: Inserting Data... dtest: DEBUG: Stopping node 3. dtest: DEBUG: Testing node stoppage (query should fail). dtest: DEBUG: Retrying read after timeout. Attempt #0 dtest: DEBUG: Retrying read after timeout. Attempt #1 dtest: DEBUG: Retrying request after UE. Attempt #2 dtest: DEBUG: Retrying request after UE. Attempt #3 dtest: DEBUG: Retrying request after UE. Attempt #4 dtest: DEBUG: Starting node 4 to replace node 3 dtest: DEBUG: Verifying querying works again. dtest: DEBUG: Verifying tokens migrated sucessfully dtest: DEBUG: ('WARN [main] 2016-04-15 16:21:21,068 TokenMetadata.java:196 - Token -3855903180169109916 changing ownership from /127.0.0.3 to /127.0.0.4\n', <_sre.SRE_Match object at 0x7fd21c0e2370>) dtest: DEBUG: Try to restart node 3 (should fail) dtest: DEBUG: [('WARN [GossipStage:1] 2016-04-15 16:21:22,942 StorageService.java:1962 - Host ID collision for 75916cc0-86ec-4136-b336-862a49953616 between /127.0.0.3 and /127.0.0.4; /127.0.0.4 is the new owner\n', <_sre.SRE_Match object at 0x7fd1f83555e0>)] --------------------- >> end captured logging << --------------------- Stacktrace File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/home/automaton/cassandra-dtest/replace_address_test.py", line 212, in replace_first_boot_test node4.start(wait_for_binary_proto=True) File "/home/automaton/ccm/ccmlib/node.py", line 610, in start node.watch_log_for_alive(self, from_mark=mark) File "/home/automaton/ccm/ccmlib/node.py", line 457, in watch_log_for_alive self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, filename=filename) File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + reads[:50] + ".....\nSee {} for remainder".format(filename)) "15 Apr 2016 16:23:41 [node3] Missing: ['127.0.0.4.* now UP']:\nINFO [main] 2016-04-15 16:21:32,345 Config.java:4.....\nSee system.log for remainder\n-------------------- >> begin captured logging << --------------------\ndtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-4i5qkE\ndtest: DEBUG: Custom init_config not found. Setting defaults.\ndtest: DEBUG: Done setting configuration options:\n{ 'initial_token': None,\n 'memtable_allocation_type': 'offheap_objects',\n 'num_tokens': '32',\n 'phi_convict_threshold': 5,\n 'start_rpc': 'true'}\ndtest: DEBUG: Starting cluster with 3 nodes.\ndtest: DEBUG: 32\ndtest: DEBUG: Inserting Data...\ndtest: DEBUG: Stopping node 3.\ndtest: DEBUG: Testing node stoppage (query should fail).\ndtest: DEBUG: Retrying read after timeout. Attempt #0\ndtest: DEBUG: Retrying read after timeout. Attempt #1\ndtest: DEBUG: Retrying request after UE. Attempt #2\ndtest: DEBUG: Retrying request after UE. Attempt #3\ndtest: DEBUG: Retrying request after UE. Attempt #4\ndtest: DEBUG: Starting node 4 to replace node 3\ndtest: DEBUG: Verifying querying works again.\ndtest: DEBUG: Verifying tokens migrated sucessfully\ndtest: DEBUG: ('WARN [main] 2016-04-15 16:21:21,068 TokenMetadata.java:196 - Token -3855903180169109916 changing ownership from /127.0.0.3 to /127.0.0.4\\n', <_sre.SRE_Match object at 0x7fd21c0e2370>)\ndtest: DEBUG: Try to restart node 3 (should fail)\ndtest: DEBUG: [('WARN [GossipStage:1] 2016-04-15 16:21:22,942 StorageService.java:1962 - Host ID collision for 75916cc0-86ec-4136-b336-862a49953616 between /127.0.0.3 and /127.0.0.4; /127.0.0.4 is the new owner\\n', <_sre.SRE_Match object at 0x7fd1f83555e0>)]\n--------------------- >> end captured logging << ---------------------"