We have a few hundreds nodes across 3 data centers, and we are doing a few millions writes per second into the cluster.
The problem we found is that there are some nodes (>10) have very wrong view of the cluster.
For example, we have 3 data centers A, B and C. On the problem nodes, in the output of the 'nodetool status', it shows that ~100 nodes are not in data center A, B, or C. Instead, it shows nodes are in DC1, and rack r1, which is very wrong. And as a result, the node will return wrong results to client requests.
We are using GossipingPropertyFileSnitch.