First the important bit: With these patches, StorageService.calculatePendingRanges is almost three orders of magnitude faster when calculating two nodes bootstrapping into a cluster with 2048 nodes (22ms vs 14.6sec). See the graph here. This was tested with 1 DC and 1 rack with RF=2.
The problem lies in NetworkTopologyStrategy.calculateNaturalEndpoints. The main problems with the existing implementation are:
1. for each datacentre:
2. it iterates through all the tokens in the ring at least once
3. then does an NlogN sort of those tokens
4. then if number of racks < RF it will iterate through all tokens again because it can't tell if it has exhausted the racks in that DC
5. then if number of hosts in that DC < RF it will iterate through all tokens again, otherwise it will iterate through until it has RF hosts in that DC
so it's doing O(DC * (N + NlogN + N + N)) operations just to work out the endpoints for a single token. StorageService.calculatePendingRanges then puts this inside other loops (such as AbstractReplicationStrategy.getAddressRanges) which makes it at least O(N^2*logN).
These patches fix (1) by iterating through the tokens only once, and processing all DCs simultaneously.
(2,3&5) relate to knowing which endpoints exist in a given DC, (4) relates to knowing which racks appear in a DC, so the first patch adds this knowledge to the snitch. The second patch makes use of this knowledge to simplify calculateNaturalEndpoints.
||Adds some functionality to AbstractEndpointSnitch to track which endpoints and racks exist in a DC (allows for fixing of 2-5).
||Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the snitch.
Note: These branches are managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files ( wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1 ), or remove them afterward ( rm .topmsg .topdeps ).