Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 2.8.0, 3.0.0-alpha2
    • mrv2
    • None

    Description

      TestKill.testKillJob often fails for the same reason with the following error message:

      1 tests failed.
      FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob
      
      Error Message:
      Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
      
      Stack Trace:
      java.lang.AssertionError: Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
      

      The root cause is that when the job is in KILLED state from an external view, TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may not have been processed by the dispatcher thread.

      Attachments

        1. mapreduce6801.002.patch
          2 kB
          Haibo Chen
        2. mapreduce6801.001.patch
          1 kB
          Haibo Chen

        Issue Links

          Activity

            yussufshaikh Yussuf Shaikh added a comment -

            Please check comment on MAPREDUCE-6802 for ppc, tried this on x86 multiple times and it did not fail. But fails around 2 times out of 10 on Power-RHEL machine.

            yussufshaikh Yussuf Shaikh added a comment - Please check comment on MAPREDUCE-6802 for ppc, tried this on x86 multiple times and it did not fail. But fails around 2 times out of 10 on Power-RHEL machine.
            haibochen Haibo Chen added a comment -

            Uploading a patch to fix the flakiness. The MR AM is in stopped state after MRApp.stop() is called, which only happens when a JobFinishEvent is processed. The JobFinishEvent is generated after all tasks/taskattempts have been properly killed.

            haibochen Haibo Chen added a comment - Uploading a patch to fix the flakiness. The MR AM is in stopped state after MRApp.stop() is called, which only happens when a JobFinishEvent is processed. The JobFinishEvent is generated after all tasks/taskattempts have been properly killed.
            hadoopqa Hadoop QA added a comment -
            +1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 15s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
            +1 mvninstall 7m 12s trunk passed
            +1 compile 0m 25s trunk passed
            +1 checkstyle 0m 16s trunk passed
            +1 mvnsite 0m 31s trunk passed
            +1 mvneclipse 0m 15s trunk passed
            +1 findbugs 0m 35s trunk passed
            +1 javadoc 0m 17s trunk passed
            +1 mvninstall 0m 23s the patch passed
            +1 compile 0m 22s the patch passed
            +1 javac 0m 22s the patch passed
            +1 checkstyle 0m 14s the patch passed
            +1 mvnsite 0m 28s the patch passed
            +1 mvneclipse 0m 13s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 0m 45s the patch passed
            +1 javadoc 0m 13s the patch passed
            +1 unit 8m 55s hadoop-mapreduce-client-app in the patch passed.
            +1 asflicense 0m 17s The patch does not generate ASF License warnings.
            22m 13s



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839259/mapreduce6801.001.patch
            JIRA Issue MAPREDUCE-6801
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux be224dfd8fa6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 04a024b
            Default Java 1.8.0_111
            findbugs v3.0.0
            Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/testReport/
            modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
            Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/console
            Powered by Apache Yetus 0.3.0 http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 12s trunk passed +1 compile 0m 25s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 31s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 35s trunk passed +1 javadoc 0m 17s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed +1 checkstyle 0m 14s the patch passed +1 mvnsite 0m 28s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 45s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 8m 55s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 22m 13s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839259/mapreduce6801.001.patch JIRA Issue MAPREDUCE-6801 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux be224dfd8fa6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 04a024b Default Java 1.8.0_111 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
            varun_saxena Varun Saxena added a comment - - edited

            Thanks haibochen for the patch. This should handle all the cases except one, although that would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore app.waitForState(job, JobState.RUNNING) should be replaced by app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)

            I was able to simulate this case by putting a sleep in dispatcher.

            varun_saxena Varun Saxena added a comment - - edited Thanks haibochen for the patch. This should handle all the cases except one, although that would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore app.waitForState(job, JobState.RUNNING) should be replaced by app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING) I was able to simulate this case by putting a sleep in dispatcher.
            haibochen Haibo Chen added a comment -

            Thanks varun_saxena for your reviews and for pointing out another case we need fix for! I'll incorporate your comments in the new patch shortly.

            haibochen Haibo Chen added a comment - Thanks varun_saxena for your reviews and for pointing out another case we need fix for! I'll incorporate your comments in the new patch shortly.
            hadoopqa Hadoop QA added a comment -
            +1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 13s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
            +1 mvninstall 6m 54s trunk passed
            +1 compile 0m 23s trunk passed
            +1 checkstyle 0m 16s trunk passed
            +1 mvnsite 0m 29s trunk passed
            +1 mvneclipse 0m 16s trunk passed
            +1 findbugs 0m 37s trunk passed
            +1 javadoc 0m 16s trunk passed
            +1 mvninstall 0m 22s the patch passed
            +1 compile 0m 20s the patch passed
            +1 javac 0m 20s the patch passed
            +1 checkstyle 0m 13s the patch passed
            +1 mvnsite 0m 26s the patch passed
            +1 mvneclipse 0m 13s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 findbugs 0m 40s the patch passed
            +1 javadoc 0m 13s the patch passed
            +1 unit 8m 52s hadoop-mapreduce-client-app in the patch passed.
            +1 asflicense 0m 17s The patch does not generate ASF License warnings.
            21m 37s



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:a9ad5d6
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839461/mapreduce6801.002.patch
            JIRA Issue MAPREDUCE-6801
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux 9a5009f9ec63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / b4f1971
            Default Java 1.8.0_111
            findbugs v3.0.0
            Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/testReport/
            modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
            Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/console
            Powered by Apache Yetus 0.3.0 http://yetus.apache.org

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 54s trunk passed +1 compile 0m 23s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 37s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed +1 checkstyle 0m 13s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 40s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 8m 52s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 21m 37s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839461/mapreduce6801.002.patch JIRA Issue MAPREDUCE-6801 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9a5009f9ec63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b4f1971 Default Java 1.8.0_111 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
            varun_saxena Varun Saxena added a comment -

            +1
            Will commit it later today.

            varun_saxena Varun Saxena added a comment - +1 Will commit it later today.
            varun_saxena Varun Saxena added a comment -

            Committed to trunk, branch-2.
            Thanks haibochen for your contribution.

            varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2. Thanks haibochen for your contribution.
            haibochen Haibo Chen added a comment -

            Thanks varun_saxena for your kind review!

            haibochen Haibo Chen added a comment - Thanks varun_saxena for your kind review!
            hudson Hudson added a comment -

            SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10863 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10863/)
            MAPREDUCE-6801. Fix flaky TestKill.testKillJob (Haibo Chen via Varun (varunsaxena: rev 7584fbf4cbafd34fac4b362cefe4e06cec16a2af)

            • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestKill.java
            hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10863 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10863/ ) MAPREDUCE-6801 . Fix flaky TestKill.testKillJob (Haibo Chen via Varun (varunsaxena: rev 7584fbf4cbafd34fac4b362cefe4e06cec16a2af) (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestKill.java
            epayne Eric Payne added a comment -

            Thanks haibochen for the fix. I have backported this to branch-2.8.

            epayne Eric Payne added a comment - Thanks haibochen for the fix. I have backported this to branch-2.8.

            People

              haibochen Haibo Chen
              haibochen Haibo Chen
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: