Details
Description
This happens if a container is preempted by scheduler after job starts committing.
And this exception in turn leads to application being marked as FAILED in YARN.
I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state is COMMITTING or SUCCEEDED as job is in the process of finishing.
Also is there any point in attempting to scheduler another task attempt if job is already in COMMITTING or SUCCEEDED state.
2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5 2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e55_1482404625971_23910_01_000010 taskAttempt attempt_1482404625971_23910_m_000004_1 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1482404625971_23910_m_000004_1 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : linux-19:26009 2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1482404625971_23910_m_60473139527690 asked for a task 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed. 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for JobFinishedEvent 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED 2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, recording last MRAppMaster retry 2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: true 2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified that shouldUnregistered is: true 2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true 2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true 2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services 2016-12-23 09:10:38,800 INFO [Thread-93] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 1 2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e55_1482404625971_23910_01_000010 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0 2016-12-23 09:10:38,993 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1482404625971_23910_m_000004_1: Container preempted by scheduler 2016-12-23 09:10:38,994 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_1 TaskAttempt Transitioned from SUCCEEDED to KILLED 2016-12-23 09:10:38,995 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from SUCCEEDED to SCHEDULED 2016-12-23 09:10:38,996 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from NEW to UNASSIGNED 2016-12-23 09:10:39,044 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0 2016-12-23 09:10:39,050 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1482404625971_23910: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11 2016-12-23 09:10:40,053 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e55_1482404625971_23910_01_000011 to attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:9 ContRel:0 HostLocal:0 RackLocal:0 2016-12-23 09:10:40,056 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_e55_1482404625971_23910_01_000011 taskAttempt attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : linux-17:26009 2016-12-23 09:10:40,068 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1482404625971_23910_m_000004_2 : 26008 2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1482404625971_23910_m_000004_2] using containerId: [container_e55_1482404625971_23910_01_000011 on NM: [linux-17:26009] 2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from ASSIGNED to RUNNING 2016-12-23 09:10:40,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from SCHEDULED to RUNNING 2016-12-23 09:10:40,108 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1482404625971_23910: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11 2016-12-23 09:10:42,949 INFO [Socket Reader #1 for port 27113] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1482404625971_23910 (auth:SIMPLE) 2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1482404625971_23910_m_60473139527691 asked for a task 2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1482404625971_23910_m_60473139527691 given task: attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:46,163 INFO [IPC Server handler 29 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from RUNNING to COMMIT_PENDING 2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: attempt_1482404625971_23910_m_000004_2 given a go for committing the task output. 2016-12-23 09:10:46,166 INFO [IPC Server handler 9 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request from attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:46,167 INFO [IPC Server handler 9 on 27113] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for attempt_1482404625971_23910_m_000004_2:true 2016-12-23 09:10:46,265 INFO [IPC Server handler 20 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from COMMIT_PENDING to SUCCEEDED 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1482404625971_23910_m_000004_2 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED 2016-12-23 09:10:46,267 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:997) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1399) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1395) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:192) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:118) at java.lang.Thread.run(Thread.java:745) 2016-12-23 09:10:46,270 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from SUCCEEDED to ERROR 2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: true 2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified that shouldUnregistered is: true 2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true 2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true 2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-5400 MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl
- Resolved