Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
SubQuery::allocateContainers() calculates a number of containers to be requested for some subquery and then requests containers as follows:
public static void allocateContainers(SubQuery subQuery) { ExecutionBlock execBlock = subQuery.getBlock(); QueryUnit [] tasks = subQuery.getQueryUnits(); int numRequest = Math.min(tasks.length, subQuery.context.getNumClusterNode() * 4);
In allocateContainers subQuery.context.getNumClusterNode() method internally invokes AMRMClient::getClusterNodeCount(). allocateContainers() requests 0 container to RM if AMRMClient::getClusterNodeCount() returns 0. If it does so, AppSchedulingInfo regards ApplicationMaster as deactive. As a result, ApplicationMaster cannot acquire any containers.
In the current Hadoop Yarn, AMRMClient::getClusterNodeCount() temporarily returns 0 due to unknown reason even though there are available cluster nodes. This problem causes the integration test (i.e., 'mvn verify') to be hanging. This patch solves this problem by enabling RMContainerAllocator to wait for available cluster nodes.