Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.2.0, 2.0.4-alpha
-
None
-
Reviewed
Description
As currently implemented, BlockPlacementPolicyWithNodeGroup does not properly fallback to local rack when no nodes are available in remote racks, resulting in an improper NotEnoughReplicasException.
@Override protected void chooseRemoteRack(int numOfReplicas, DatanodeDescriptor localMachine, HashMap<Node, Node> excludedNodes, long blocksize, int maxReplicasPerRack, List<DatanodeDescriptor> results, boolean avoidStaleNodes) throws NotEnoughReplicasException { int oldNumOfReplicas = results.size(); // randomly choose one node from remote racks try { chooseRandom( numOfReplicas, "~" + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } catch (NotEnoughReplicasException e) { chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), localMachine.getNetworkLocation(), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } }
As currently coded the chooseRandom() call in the catch block will never succeed as the set of nodes within the passed in node path (e.g. /rack1/nodegroup1) is entirely contained within the set of excluded nodes (both are the set of nodes within the same nodegroup as the node chosen first replica).
The bug is that the fallback chooseRandom() call in the catch block should be passing in the complement of the node path used in the initial chooseRandom() call in the try block (e.g. /rack1) - namely:
NetworkTopology.getFirstHalf(localMachine.getNetworkLocation())
This will yield the proper fallback behavior of choosing a random node from within the same rack, but still excluding those nodes in the same nodegroup
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-8468 Umbrella of enhancements to support different failure and locality topologies
- Resolved