[CASSANDRA-17506] Excessive logging with nodes in Gossip state "shutdown" - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Normal
Resolution: Unresolved
Fix Version/s: 3.11.x
Component/s: Cluster/Gossip
Labels:
None

Bug Category:
Correctness
Severity:
Normal
Complexity:
Normal
Discovered By:
User Report
Platform:

All
Impacts:

None

Description

With Cassandra running as a statefulset in Kubernetes the nodes will change IP address after a rolling reboot of the cluster. While I haven't noticed any operational issues of the cluster itself from this it results in excessive logging of lines like the below:

INFO Nodes /10.42.7.52 and /10.42.7.55 have the same token 7421423452625866771. Ignoring /10.42.7.52

And less frequent, but regular:

INFO FatClient /10.42.7.52 has been silent for 30000ms, removing from gossip

These logs now make up the majority of the logs from the system in question causing unnecessary pressure on the logging infrastructure, etc.

This is an example of what the gossip state looks like:

nodetool gossipinfo
/10.42.6.66
  generation:1648711798
  heartbeat:1819
  STATUS:18:NORMAL,-1041160339925870253
  LOAD:1783:2.349975364E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:tc
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.6.66
  RPC_ADDRESS:4:10.42.6.66
  NET_VERSION:2:11
  HOST_ID:3:d4ff4ea6-2b37-4878-8253-7e03440c3216
  RPC_READY:38:true
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>
/10.42.8.54
  generation:1648711838
  heartbeat:1778
  STATUS:18:NORMAL,-100824550225698369
  LOAD:1717:2.514461697E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:mh
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.8.54
  RPC_ADDRESS:4:10.42.8.54
  NET_VERSION:2:11
  HOST_ID:3:09413185-be66-4ffe-be17-3fa865037e47
  RPC_READY:38:true
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>
/10.42.7.54
  generation:1648711286
  heartbeat:2147483647
  STATUS:1757:shutdown,true
  LOAD:477:1.571279055E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:ix
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.7.54
  RPC_ADDRESS:4:10.42.7.54
  NET_VERSION:2:11
  HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa
  RPC_READY:1758:false
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>
/10.42.7.55
  generation:1648711764
  heartbeat:1856
  STATUS:18:NORMAL,-1095602411864500569
  LOAD:1847:1.571445304E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:ix
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.7.55
  RPC_ADDRESS:4:10.42.7.55
  NET_VERSION:2:11
  HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa
  RPC_READY:38:true
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>
/10.42.7.52
  generation:1648120338
  heartbeat:2147483647
  STATUS:1759:shutdown,true
  LOAD:606284:1.570018542E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:ix
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.7.52
  RPC_ADDRESS:4:10.42.7.52
  NET_VERSION:2:11
  HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa
  RPC_READY:1760:false
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>
/10.42.7.53
  generation:1648707234
  heartbeat:2147483647
  STATUS:1766:shutdown,true
  LOAD:4184:1.571429353E9
  SCHEMA:14:06874cde-85cb-3905-b939-9fa68972f835
  DC:10:ix
  RACK:12:rack1
  RELEASE_VERSION:5:3.11.12
  INTERNAL_IP:8:10.42.7.53
  RPC_ADDRESS:4:10.42.7.53
  NET_VERSION:2:11
  HOST_ID:3:5a5d3810-874a-4168-9e99-6eab3f6f3cfa
  RPC_READY:1767:false
  SSTABLE_VERSIONS:6:big-me,big-md
  TOKENS:17:<hidden>

While there are likely ways to clean the gossip state to get rid of this I'd rather not get involved in it since the problem will re-appear once the nodes in the cluster are restarted again.

I've tried setting `cassandra.load_ring_state=false` but it does not help as I guess the state is replicated from existing nodes during a rolling reboot and would only be cleared by a full cluster shutdown/startup which is not an option in a production system.

Is there any other way I can avoid this?

The Cassandra version used is 3.11.12.

Attachments

Activity

People

Assignee:: Brandon Williams

Reporter:: Tobias Gustafsson

Authors:: Brandon Williams

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Mar/22 09:11

Updated:: 31/Mar/22 20:03