Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Normal
Description
We have a few hundreds nodes across 3 data centers, and we are doing a few millions writes per second into the cluster.
The problem we found is that there are some nodes (>10) have very wrong view of the cluster.
For example, we have 3 data centers A, B and C. On the problem nodes, in the output of the 'nodetool status', it shows that ~100 nodes are not in data center A, B, or C. Instead, it shows nodes are in DC1, and rack r1, which is very wrong. And as a result, the node will return wrong results to client requests.
Datacenter: DC1 =============== Status=Up/Down / State=Normal/Leaving/Joining/Moving – Address Load Tokens Owns Host ID Rack UN 2401:db00:11:6134:face:0:1:0 509.52 GB 256 ? e24656ac-c3b2-4117-b933-a5b06852c993 r1 UN 2401:db00:11:b218:face:0:5:0 510.01 GB 256 ? 53da2104-b1b5-4fa5-a3dd-52c7557149f9 r1 UN 2401:db00:2130:5133:face:0:4d:0 459.75 GB 256 ? ef8311f0-f6b8-491c-904d-baa925cdd7c2 r1
We are using GossipingPropertyFileSnitch.
Thanks
Attachments
Issue Links
- relates to
-
CASSANDRA-11709 Lock contention when large number of dead nodes come back within short time
- In Progress