Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Component/s: mrv2, test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestRecovery is occasionally failing with this error:

      testCrashed(org.apache.hadoop.mapreduce.v2.app.TestRecovery): TaskAttempt state is not correct (timedout) expected:<FAILED> but was:<STARTING>
      

        Activity

        Hide
        jlowe Jason Lowe added a comment -

        Looking at the test output when it fails, there's an invalid state transition:

        2012-11-10 23:19:07,665 INFO  [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(993)) - attempt_0_0000_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED
        TaskAttempt State is : FAILED
        TaskAttempt State is : STARTING Waiting for state : FAILED   progress : 0.0
        2012-11-10 23:19:07,667 ERROR [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(984)) - Can't handle this event at current state for attempt_0_0000_m_000000_1
        org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at UNASSIGNED
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
                at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:982)
                at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1)
                at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:996)
                at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1)
                at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128)
                at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
                at java.lang.Thread.run(Thread.java:662)
        

        I think the problem occurs because the test is trying to inject TA_CONTAINER_LAUNCH_FAILED into the attempt state machine asynchronously. Sometimes that event arrives at the appropriate state and the test passes, sometimes it arrives at an inappropriate state and the test fails.

        Show
        jlowe Jason Lowe added a comment - Looking at the test output when it fails, there's an invalid state transition: 2012-11-10 23:19:07,665 INFO [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(993)) - attempt_0_0000_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED TaskAttempt State is : FAILED TaskAttempt State is : STARTING Waiting for state : FAILED progress : 0.0 2012-11-10 23:19:07,667 ERROR [AsyncDispatcher event handler] impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(984)) - Can't handle this event at current state for attempt_0_0000_m_000000_1 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at UNASSIGNED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:982) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:996) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) I think the problem occurs because the test is trying to inject TA_CONTAINER_LAUNCH_FAILED into the attempt state machine asynchronously. Sometimes that event arrives at the appropriate state and the test passes, sometimes it arrives at an inappropriate state and the test fails.
        Show
        zjshen Zhijie Shen added a comment - Recently, I met two occurrences when I submitted patches on Jira. HadoopQA failed on TestRecovery. However, when I resubmitted newer patches, the problem was gone. YARN-450 : https://issues.apache.org/jira/browse/YARN-450?focusedCommentId=13605667&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13605667 MAPREDUCE-4956 : https://issues.apache.org/jira/browse/MAPREDUCE-4956?focusedCommentId=13607023&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607023
        Hide
        haibochen Haibo Chen added a comment -

        Agree with Jason Lowe on the root cause. The fix is similar to that of MAPREDUCE-6768. Will upload a patch once MR-6768 is committed.

        Show
        haibochen Haibo Chen added a comment - Agree with Jason Lowe on the root cause. The fix is similar to that of MAPREDUCE-6768 . Will upload a patch once MR-6768 is committed.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 9s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 12s trunk passed
        +1 compile 0m 22s trunk passed
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 29s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 0m 35s trunk passed
        +1 javadoc 0m 15s trunk passed
        +1 mvninstall 0m 22s the patch passed
        +1 compile 0m 20s the patch passed
        +1 javac 0m 20s the patch passed
        -1 checkstyle 0m 14s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: The patch generated 2 new + 117 unchanged - 2 fixed = 119 total (was 119)
        +1 mvnsite 0m 26s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 44s the patch passed
        +1 javadoc 0m 12s the patch passed
        +1 unit 8m 46s hadoop-mapreduce-client-app in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        21m 41s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826084/mapreduce4784.001.patch
        JIRA Issue MAPREDUCE-4784
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 735eab60b201 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / ed6ff5c
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/testReport/
        modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 12s trunk passed +1 compile 0m 22s trunk passed +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 35s trunk passed +1 javadoc 0m 15s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed -1 checkstyle 0m 14s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: The patch generated 2 new + 117 unchanged - 2 fixed = 119 total (was 119) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 44s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 8m 46s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 21m 41s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826084/mapreduce4784.001.patch JIRA Issue MAPREDUCE-4784 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 735eab60b201 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ed6ff5c Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6703/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        +1 lgtm. Committing this.

        Show
        jlowe Jason Lowe added a comment - +1 lgtm. Committing this.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks, Haibo Chen! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.

        Show
        jlowe Jason Lowe added a comment - Thanks, Haibo Chen ! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10374 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10374/)
        MAPREDUCE-4784. TestRecovery occasionally fails. Contributed by Haibo (jlowe: rev af508605a9edc126c170160291dbc2fe58b66dea)

        • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10374 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10374/ ) MAPREDUCE-4784 . TestRecovery occasionally fails. Contributed by Haibo (jlowe: rev af508605a9edc126c170160291dbc2fe58b66dea) (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java

          People

          • Assignee:
            haibochen Haibo Chen
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development