Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
-
None
-
None
Description
Right now when I remove a node that is up I understand 2 choices.
nodetool decommission: The current hosts starts sending out data as the only source. This takes a long time. The one node you want to remove becomes a huge bottle neck (even worse if you want to remove it because it is under performing).
cassandra stop, then nodetool removenode: This lets all other nodes that share the keyranges be sources. This runs about 8-16X faster than decommission on my system, but this requires me to run with reduced redundancy when it happens.
I think it would be really cool if there was a way to decommission a node that is up, and leverage the power of other data sources.
Request : When you decommission a node all other nodes that share a keyrange should also help in being sources for the data that needs to be copied. Maybe, with options for how far to get the data/balance of load: old behavior, same rack, same dc, other dc (or some default scheme based on latency between nodes/racks/dcs).