Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
YARN-2928
Description
Found a couple of issues while testing ATSv2.
- There is a NPE while publishing DS_CONTAINER_START_EVENT which in turn means that this event is not published.
2016-06-07 23:19:00,020 [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0] INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl: Unchecked exception is thrown from onContainerStarted for Container container_e77_1465311876353_0007_01_000002 java.lang.NullPointerException at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:389) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.putContainerEntity(ApplicationMaster.java:1284) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerStartEvent(ApplicationMaster.java:1235) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$1200(ApplicationMaster.java:175) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$NMCallbackHandler.onContainerStarted(ApplicationMaster.java:986) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:454) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:436) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:617) at org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:676) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
- Created time is not reported from distributed shell for both DS_CONTAINER and DS_APP_ATTEMPT entities.
As can be seen below, when we query DS_APP_ATTEMPT entities, we do not get createdtime in response.[ { "metrics": [ ], "events": [ ], "type": "DS_APP_ATTEMPT", "id": "appattempt_1465246237936_0003_000001", "isrelatedto": { }, "relatesto": { }, "info": { "UID": "yarn-cluster!application_1465246237936_0003!DS_APP_ATTEMPT!appattempt_1465246237936_0003_000001" }, "configs": { } } ]
As can be seen from response received upon querying a DS_CONTAINER entity we can see that createdtime is not present and DS_CONTAINER_START is not present either(due to NPE pointed above).
{ "metrics": [ ], "events": [ { "id": "DS_CONTAINER_END", "timestamp": 1465314587480, "info": { "Exit Status": 0, "State": "COMPLETE" } } ], "type": "DS_CONTAINER", "id": "container_e77_1465311876353_0003_01_000002", "isrelatedto": { }, "relatesto": { }, "info": { "UID": "yarn-cluster!application_1465311876353_0003!DS_CONTAINER!container_e77_1465311876353_0003_01_000002" }, "configs": { } }