Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.1
-
None
-
None
Description
Pre-requisites:
1. Install HA cluster.
2.Set yarn.nodemanager.opportunistic-containers-max-queue-length=(positive integer value)[NodeManager->yarnsite.xml]
3. Set yarn.resourcemanager.opportunistic-container-allocation.enabled= true[ResourceManager->yarnsite.xml]
Steps to reproduce:
1.Keep All NodeManagers Up
2.Stop 2 Nodemanagers and immediately follow step 3.
3.Submit a job with -Dmapreduce.job.num-opportunistic-maps-percent="100" and run with 50 mappers
Expected Result:
Job should be successfull
Actual Result:
Job is getting successfull but some containers are failing stating TaskAttempt killed because it ran on unusable node , Container released on a *lost* node"
Log Details:
TaskAttempt killed because it ran on unusable node Container released on a *lost* node Container launch failed for container_1534149133116_0019_01_000006 : java.net.ConnectException: Call From hostname/IP to hostname:portNumber failed on connection exception: java.net.ConnectException: