Token ring management is one of the most critical parts of Cassandra, yet one of the most overlooked. Some of the problems include but are not limited to:
- Complexity (ie. pending range calculation)
- Inefficiency (ie. pending range calculation, AbstractReplicationStrategy.getAddressReplicas)
- Prone to race conditions (ie. here)
- Poor modularity and consistency (ie. natural replicas computed from NetworkTopologyStrategy and pending replicas computed from TokenMetadata)
- Insufficient testing (due to complexity and poor modularity)
These limitations make it difficult to reliably fix bugs like properly supporting node replacement with the same IP address (CASSANDRA-12344), add improvements such as safe ring membership changes, support for networking via identity instead of IP (CASSANDRA-15823) or add new features such as dynamic virtual nodes.
This ticket aims at refactoring the ring management sub-module (namely TokenMetadata and related classes) to address most of its current limitations in order to support further improvements and new features.
Some of the requirements of the proposed refactoring are:
- Make node-local ring representation fully immutable and snapshottable.
- Add content-based versioning to uniquely identify a ring snapshot throughout the cluster.
- Make token ring management vnode-centric to support membership operations on individual tokens and simplify token assignment calculations.
- Primarily identify ring endpoints by node ID to decouple a node’s identity from its IP address.
- Add a local publish/subscribe mechanism for ring change notifications, so other modules can subscribe to it and receive the newest snapshot of the ring after membership changes.
- Add testing framework to verify correctness of ring membership operations.
- Ensure the refactored sub-module does not change current behavior via comprehensive testing.