Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Correctness
-
Normal
-
Normal
-
User Report
-
All
-
None
Description
With Cassandra running as a statefulset in Kubernetes the nodes will change IP address after a rolling reboot of the cluster. While I haven't noticed any operational issues of the cluster itself from this it results in excessive logging of lines like the below:
INFO Nodes /10.42.7.52 and /10.42.7.55 have the same token 7421423452625866771. Ignoring /10.42.7.52
And less frequent, but regular:
INFO FatClient /10.42.7.52 has been silent for 30000ms, removing from gossip
These logs now make up the majority of the logs from the system in question causing unnecessary pressure on the logging infrastructure, etc.
This is an example of what the gossip state looks like:
nodetool gossipinfo /10.42.6.66 generation:1648711798 heartbeat:1819 STATUS:18:NORMAL,-1041160339925870253 LOAD:1783:2.349975364E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:tc RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.6.66 RPC_ADDRESS:4:10.42.6.66 NET_VERSION:2:11 HOST_ID:3:d4ff4ea6-2b37-4878-8253-7e03440c3216 RPC_READY:38:true SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden> /10.42.8.54 generation:1648711838 heartbeat:1778 STATUS:18:NORMAL,-100824550225698369 LOAD:1717:2.514461697E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:mh RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.8.54 RPC_ADDRESS:4:10.42.8.54 NET_VERSION:2:11 HOST_ID:3:09413185-be66-4ffe-be17-3fa865037e47 RPC_READY:38:true SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden> /10.42.7.54 generation:1648711286 heartbeat:2147483647 STATUS:1757:shutdown,true LOAD:477:1.571279055E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:ix RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.7.54 RPC_ADDRESS:4:10.42.7.54 NET_VERSION:2:11 HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa RPC_READY:1758:false SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden> /10.42.7.55 generation:1648711764 heartbeat:1856 STATUS:18:NORMAL,-1095602411864500569 LOAD:1847:1.571445304E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:ix RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.7.55 RPC_ADDRESS:4:10.42.7.55 NET_VERSION:2:11 HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa RPC_READY:38:true SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden> /10.42.7.52 generation:1648120338 heartbeat:2147483647 STATUS:1759:shutdown,true LOAD:606284:1.570018542E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:ix RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.7.52 RPC_ADDRESS:4:10.42.7.52 NET_VERSION:2:11 HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa RPC_READY:1760:false SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden> /10.42.7.53 generation:1648707234 heartbeat:2147483647 STATUS:1766:shutdown,true LOAD:4184:1.571429353E9 SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835 DC:10:ix RACK:12:rack1 RELEASE_VERSION:5:3.11.12 INTERNAL_IP:8:10.42.7.53 RPC_ADDRESS:4:10.42.7.53 NET_VERSION:2:11 HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa RPC_READY:1767:false SSTABLE_VERSIONS:6:big-me,big-md TOKENS:17:<hidden>
While there are likely ways to clean the gossip state to get rid of this I'd rather not get involved in it since the problem will re-appear once the nodes in the cluster are restarted again.
I've tried setting `cassandra.load_ring_state=false` but it does not help as I guess the state is replicated from existing nodes during a rolling reboot and would only be cleared by a full cluster shutdown/startup which is not an option in a production system.
Is there any other way I can avoid this?
The Cassandra version used is 3.11.12.