Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6826

Job fails with InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.2
    • 3.4.0, 3.3.1
    • mrv2
    • None
    • Reviewed

    Description

      This happens if a container is preempted by scheduler after job starts committing.
      And this exception in turn leads to application being marked as FAILED in YARN.
      I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state is COMMITTING or SUCCEEDED as job is in the process of finishing.
      Also is there any point in attempting to scheduler another task attempt if job is already in COMMITTING or SUCCEEDED state.

      2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED
      2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5
      2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING
      2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e55_1482404625971_23910_01_000010 taskAttempt attempt_1482404625971_23910_m_000004_1
      2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1482404625971_23910_m_000004_1
      2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : linux-19:26009
      2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
      2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1482404625971_23910_m_60473139527690 asked for a task
      2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed.
      2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for JobFinishedEvent 
      2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED
      2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, recording last MRAppMaster retry
      2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: true
      2016-12-23 09:10:38,798 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified that shouldUnregistered is: true
      2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
      2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true
      2016-12-23 09:10:38,799 INFO [Thread-93] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
      2016-12-23 09:10:38,800 INFO [Thread-93] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 1
      2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0
      2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e55_1482404625971_23910_01_000010
      2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0
      2016-12-23 09:10:38,993 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1482404625971_23910_m_000004_1: Container preempted by scheduler
      2016-12-23 09:10:38,994 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_1 TaskAttempt Transitioned from SUCCEEDED to KILLED
      2016-12-23 09:10:38,995 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from SUCCEEDED to SCHEDULED
      2016-12-23 09:10:38,996 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from NEW to UNASSIGNED
      2016-12-23 09:10:39,044 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0
      2016-12-23 09:10:39,050 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1482404625971_23910: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11
      2016-12-23 09:10:40,053 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1
      2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e55_1482404625971_23910_01_000011 to attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:9 ContRel:0 HostLocal:0 RackLocal:0
      2016-12-23 09:10:40,056 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
      2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_e55_1482404625971_23910_01_000011 taskAttempt attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : linux-17:26009
      2016-12-23 09:10:40,068 INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1482404625971_23910_m_000004_2 : 26008
      2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1482404625971_23910_m_000004_2] using containerId: [container_e55_1482404625971_23910_01_000011 on NM: [linux-17:26009]
      2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from ASSIGNED to RUNNING
      2016-12-23 09:10:40,075 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from SCHEDULED to RUNNING
      2016-12-23 09:10:40,108 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1482404625971_23910: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11
      2016-12-23 09:10:42,949 INFO [Socket Reader #1 for port 27113] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1482404625971_23910 (auth:SIMPLE)
      2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1482404625971_23910_m_60473139527691 asked for a task
      2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1482404625971_23910_m_60473139527691 given task: attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:46,163 INFO [IPC Server handler 29 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update from attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from RUNNING to COMMIT_PENDING
      2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: attempt_1482404625971_23910_m_000004_2 given a go for committing the task output.
      2016-12-23 09:10:46,166 INFO [IPC Server handler 9 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request from attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:46,167 INFO [IPC Server handler 9 on 27113] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for attempt_1482404625971_23910_m_000004_2:true
      2016-12-23 09:10:46,265 INFO [IPC Server handler 20 on 27113] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from COMMIT_PENDING to SUCCEEDED
      2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1482404625971_23910_m_000004_2
      2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED
      2016-12-23 09:10:46,267 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:997)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1399)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1395)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:192)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:118)
      	at java.lang.Thread.run(Thread.java:745)
      2016-12-23 09:10:46,270 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1482404625971_23910Job Transitioned from SUCCEEDED to ERROR
      2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator isAMLastRetry: true
      2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified that shouldUnregistered is: true
      2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
      2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: JobHistoryEventHandler notified that forceJobCompletion is true
      2016-12-23 09:10:46,271 INFO [Thread-97] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the services
      

      Attachments

        1. MAPREDUCE-6826-003.patch
          4 kB
          Bilwa S T
        2. MAPREDUCE-6826-002.patch
          4 kB
          Bilwa S T
        3. MAPREDUCE-6826-001.patch
          2 kB
          Bilwa S T

        Issue Links

          Activity

            People

              BilwaST Bilwa S T
              varun_saxena Varun Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: