Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4748

Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.3
    • Fix Version/s: 2.0.3-alpha, 0.23.5
    • Component/s: mrv2
    • Labels:
      None

      Description

      We saw this happen when running a large pig script.

      2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1350837501057_21978_m_040453
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
              at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
              at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
              at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
              at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
              at java.lang.Thread.run(Thread.java:619)
      

      Speculative execution was enabled, and that task did speculate so it looks like this is an error in the state machine either between the task attempts or just within that single task.

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1238 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1238/)
          MAPREDUCE-4748. Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658)

          Result = FAILURE
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1238 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1238/ ) MAPREDUCE-4748 . Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1208 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1208/)
          MAPREDUCE-4748. Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1208 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1208/ ) MAPREDUCE-4748 . Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #417 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/417/)
          MAPREDUCE-4748. Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402666)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402666
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #417 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/417/ ) MAPREDUCE-4748 . Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402666) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402666 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #18 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/18/)
          MAPREDUCE-4748. Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #18 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/18/ ) MAPREDUCE-4748 . Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2932 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2932/)
          MAPREDUCE-4748. Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658)

          Result = SUCCESS
          jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2932 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2932/ ) MAPREDUCE-4748 . Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED. Contributed by Jason Lowe (Revision 1402658) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402658 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java
          Hide
          Jason Lowe added a comment -

          I committed this to trunk, branch-2, and branch-0.23.

          Show
          Jason Lowe added a comment - I committed this to trunk, branch-2, and branch-0.23.
          Hide
          Jason Lowe added a comment -

          Thanks for the review, Bobby. Pushing this in.

          Show
          Jason Lowe added a comment - Thanks for the review, Bobby. Pushing this in.
          Hide
          Robert Joseph Evans added a comment -

          The patch looks good. And if Vinod is going to fix the book keeping in MAPREDUCE-4751 I am a +1.

          Show
          Robert Joseph Evans added a comment - The patch looks good. And if Vinod is going to fix the book keeping in MAPREDUCE-4751 I am a +1.
          Hide
          Jason Lowe added a comment -

          Can you just handle them by ignoring here.

          Current patch does just that, agree we can fix the bookkeeping issues with MAPREDUCE-4751.

          Elevating priority since our users are hitting this often and having to reissue jobs.

          Show
          Jason Lowe added a comment - Can you just handle them by ignoring here. Current patch does just that, agree we can fix the bookkeeping issues with MAPREDUCE-4751 . Elevating priority since our users are hitting this often and having to reissue jobs.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550896/MAPREDUCE-4748.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2967//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2967//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550896/MAPREDUCE-4748.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2967//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2967//console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Correct link.

          Show
          Vinod Kumar Vavilapalli added a comment - Correct link.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Sigh. I think we never ran into these before as we don't have speculation on by default.

          Can you just handle them by ignoring here. I can fix the issue at MAPREDUCE-4745.

          Show
          Vinod Kumar Vavilapalli added a comment - Sigh . I think we never ran into these before as we don't have speculation on by default. Can you just handle them by ignoring here. I can fix the issue at MAPREDUCE-4745 .
          Hide
          Jason Lowe added a comment -

          Simple patch to ignore T_ATTEMPT_SUCCEEDED, T_KILL, and T_ATTEMPT_COMMIT_PENDING at SUCCEEDED and keep the job from abruptly ending in error.

          I'm a bit worried about the bookkeeping wrt. task.finishedAttempts and task.numberUncompletedAttempts. Current patch matches the bookkeeping behavior for T_ATTEMPT_KILLED or T_ATTEMPT_FAILED when we're effectively ignoring the event. However I'm wondering if this could lead to corner cases during KILL_WAIT like those reported in MAPREDUCE-4745.

          It looks like TaskAttempt will report T_ATTEMPT_KILLED after it succeeded but only for map tasks. We don't want to double-count in that case, but if a kill of the TaskAttempt doesn't report it was killed it seems like we could miss some bookeeping if we just ignore bookkeeping when we see an attempt redundantly succeeded. Thoughts?

          Show
          Jason Lowe added a comment - Simple patch to ignore T_ATTEMPT_SUCCEEDED, T_KILL, and T_ATTEMPT_COMMIT_PENDING at SUCCEEDED and keep the job from abruptly ending in error. I'm a bit worried about the bookkeeping wrt. task.finishedAttempts and task.numberUncompletedAttempts. Current patch matches the bookkeeping behavior for T_ATTEMPT_KILLED or T_ATTEMPT_FAILED when we're effectively ignoring the event. However I'm wondering if this could lead to corner cases during KILL_WAIT like those reported in MAPREDUCE-4745 . It looks like TaskAttempt will report T_ATTEMPT_KILLED after it succeeded but only for map tasks. We don't want to double-count in that case, but if a kill of the TaskAttempt doesn't report it was killed it seems like we could miss some bookeeping if we just ignore bookkeeping when we see an attempt redundantly succeeded. Thoughts?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Genuine bug. TaskImpl needs to accept T_ATTEMPT_SUCCEEDED at SUCCEEDED. Not sure how we missed something as basic as this. We should also accept-and-ignore T_KILL and T_ATTEMPT_COMMIT_PENDING.

          Show
          Vinod Kumar Vavilapalli added a comment - Genuine bug. TaskImpl needs to accept T_ATTEMPT_SUCCEEDED at SUCCEEDED. Not sure how we missed something as basic as this. We should also accept-and-ignore T_KILL and T_ATTEMPT_COMMIT_PENDING.
          Hide
          Jason Lowe added a comment -

          Here's a log from another case showing we have a race between two attempts from the same task that succeed almost simultaneously:

          2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_m_032327_1
          2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_m_032327_1 is : 1.0
          2012-10-24 11:31:40,751 INFO [IPC Server handler 21 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1350066773975_116662_m_032327_1
          2012-10-24 11:31:40,751 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
          2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_051566 taskAttempt attempt_1350066773975_116662_m_032327_1
          2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1350066773975_116662_m_032327_1
          2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_r_000003_0
          2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_r_000003_0 is : 0.3333072
          2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_m_032327_0
          2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_m_032327_0 is : 1.0
          2012-10-24 11:31:40,756 INFO [IPC Server handler 20 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1350066773975_116662_m_032327_0
          2012-10-24 11:31:40,756 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP
          2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_037193 taskAttempt attempt_1350066773975_116662_m_032327_0
          2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1350066773975_116662_m_032327_0
          2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
          2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1350066773975_116662_m_032327_1
          2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Issuing kill to other attempt attempt_1350066773975_116662_m_032327_0
          2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1350066773975_116662_m_032327 Task Transitioned from RUNNING to SUCCEEDED
          2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 51029
          2012-10-24 11:31:40,780 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
          2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1350066773975_116662_m_032327
          org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
          	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
          	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
          	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
          	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
          	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
          	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
          	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
          	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
          	at java.lang.Thread.run(Thread.java:619)
          2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Invalid event T_ATTEMPT_SUCCEEDED on Task task_1350066773975_116662_m_032327
          2012-10-24 11:31:40,818 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1350066773975_116662Job Transitioned from RUNNING to ERROR
          

          We tried to kill the other attempt but it succeeded before the kill arrived, hence T_ATTEMPT_SUCCEEDED at SUCCEEDED.

          Show
          Jason Lowe added a comment - Here's a log from another case showing we have a race between two attempts from the same task that succeed almost simultaneously: 2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_m_032327_1 2012-10-24 11:31:40,751 INFO [IPC Server handler 1 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_m_032327_1 is : 1.0 2012-10-24 11:31:40,751 INFO [IPC Server handler 21 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1350066773975_116662_m_032327_1 2012-10-24 11:31:40,751 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP 2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_051566 taskAttempt attempt_1350066773975_116662_m_032327_1 2012-10-24 11:31:40,752 INFO [ContainerLauncher #55] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1350066773975_116662_m_032327_1 2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_r_000003_0 2012-10-24 11:31:40,754 INFO [IPC Server handler 7 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_r_000003_0 is : 0.3333072 2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1350066773975_116662_m_032327_0 2012-10-24 11:31:40,755 INFO [IPC Server handler 25 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1350066773975_116662_m_032327_0 is : 1.0 2012-10-24 11:31:40,756 INFO [IPC Server handler 20 on 52922] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1350066773975_116662_m_032327_0 2012-10-24 11:31:40,756 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP 2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1350066773975_116662_01_037193 taskAttempt attempt_1350066773975_116662_m_032327_0 2012-10-24 11:31:40,756 INFO [ContainerLauncher #484] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1350066773975_116662_m_032327_0 2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_1 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED 2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1350066773975_116662_m_032327_1 2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Issuing kill to other attempt attempt_1350066773975_116662_m_032327_0 2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1350066773975_116662_m_032327 Task Transitioned from RUNNING to SUCCEEDED 2012-10-24 11:31:40,757 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 51029 2012-10-24 11:31:40,780 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1350066773975_116662_m_032327_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED 2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1350066773975_116662_m_032327 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-10-24 11:31:40,814 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Invalid event T_ATTEMPT_SUCCEEDED on Task task_1350066773975_116662_m_032327 2012-10-24 11:31:40,818 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1350066773975_116662Job Transitioned from RUNNING to ERROR We tried to kill the other attempt but it succeeded before the kill arrived, hence T_ATTEMPT_SUCCEEDED at SUCCEEDED.

            People

            • Assignee:
              Jason Lowe
              Reporter:
              Robert Joseph Evans
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development