Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
-
None
-
Any cluster that has vnodes and consists of hundreds of physical nodes.
-
Normal
Description
There are a lot of gossip-related issues related to very wide clusters that also have vnodes enabled. Let's use this ticket as a master in case there are sub-tickets.
The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge instances. Each node configured with 32 vnodes.
Without vnodes, cluster spins up fine and is ready to handle requests within 30 minutes or less.
With vnodes, nodes are reporting constant up/down flapping messages with no external load on the cluster. After a couple of hours, they were still flapping, had very high cpu load, and the cluster never looked like it was going to stabilize or be useful for traffic.
Attachments
Attachments
Issue Links
- is blocked by
-
CASSANDRA-6409 gossip performance improvement at node startup
- Resolved
-
CASSANDRA-6385 FD phi estimator initial conditions
- Resolved
-
CASSANDRA-6386 FD mean calculation performance improvement
- Resolved
-
CASSANDRA-6410 gossip memory usage improvement
- Resolved
- relates to
-
CASSANDRA-6297 Gossiper blocks when updating tokens and turns node down
- Resolved
-
CASSANDRA-6345 Endpoint cache invalidation causes CPU spike (on vnode rings?)
- Resolved
-
CASSANDRA-6338 Make gossip tolerate slow Gossip tasks
- Resolved
-
CASSANDRA-4288 prevent thrift server from starting before gossip has settled
- Resolved
-
CASSANDRA-6244 calculatePendingRanges could be asynchronous on 1.2 too
- Resolved