Maybe rename the method or add more comments there.
Sure, I updated the original comment before that addToExcludedNodes call.
numOfDatanodes v.s. availableNodes in NetworkTopology.java's chooseRandom
This is the fun part. They're different things. The implementation of InnerNode#getNumOfLeaves is to return the total leaves, and the 'randomly choose 1' is done by innerNode.getLeaf(leaveIndex, node), providing an (randomly generated) index, and the (most ancestor) node from excludedScope. I checked all the way in for feasibility of adding excludedNodes to getLeaf when coming up with patch 3, but decided to have current implementation for 2 reasons:
- Less change. We don't have to change all the way into InnerNode for this bug fix, hence less effort.
- It is more consistent with current behavior. Currently we loop in BPPD, if we get a node that's already excluded, we call chooseDataNode again. This patch simply moves this loop inside.
1 implementation detail I also considered is, in NetworkTopology.java's chooseRandom, without changing InnerNode, we could maintain the index mapping of available nodes, and randomly choose the index from the mapping, then get the node using the index. If this node is in excludeNodes, we remove that index from the mapping. Although this would make the loop run less iterations (since each time a different node will be coming from the set), for HDFS with enormous number of DNs, the space consumption and the overhead of setting up the index mapping overrules the benefit. I assume this is why we have that simple loop in BPPD at first.
Please let me know what you think. Thanks!