Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: mrv2
    • Labels:
      None

      Description

      TestKill.testKillJob often fails for the same reason with the following error message:

      1 tests failed.
      FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob
      
      Error Message:
      Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
      
      Stack Trace:
      java.lang.AssertionError: Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
      

      The root cause is that when the job is in KILLED state from an external view, TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may not have been processed by the dispatcher thread.

      1. mapreduce6801.001.patch
        1 kB
        Haibo Chen
      2. mapreduce6801.002.patch
        2 kB
        Haibo Chen

        Issue Links

          Activity

          Hide
          yussufshaikh Yussuf Shaikh added a comment -

          Please check comment on MAPREDUCE-6802 for ppc, tried this on x86 multiple times and it did not fail. But fails around 2 times out of 10 on Power-RHEL machine.

          Show
          yussufshaikh Yussuf Shaikh added a comment - Please check comment on MAPREDUCE-6802 for ppc, tried this on x86 multiple times and it did not fail. But fails around 2 times out of 10 on Power-RHEL machine.
          Hide
          haibochen Haibo Chen added a comment -

          Uploading a patch to fix the flakiness. The MR AM is in stopped state after MRApp.stop() is called, which only happens when a JobFinishEvent is processed. The JobFinishEvent is generated after all tasks/taskattempts have been properly killed.

          Show
          haibochen Haibo Chen added a comment - Uploading a patch to fix the flakiness. The MR AM is in stopped state after MRApp.stop() is called, which only happens when a JobFinishEvent is processed. The JobFinishEvent is generated after all tasks/taskattempts have been properly killed.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 12s trunk passed
          +1 compile 0m 25s trunk passed
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 31s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 0m 35s trunk passed
          +1 javadoc 0m 17s trunk passed
          +1 mvninstall 0m 23s the patch passed
          +1 compile 0m 22s the patch passed
          +1 javac 0m 22s the patch passed
          +1 checkstyle 0m 14s the patch passed
          +1 mvnsite 0m 28s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 45s the patch passed
          +1 javadoc 0m 13s the patch passed
          +1 unit 8m 55s hadoop-mapreduce-client-app in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          22m 13s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839259/mapreduce6801.001.patch
          JIRA Issue MAPREDUCE-6801
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux be224dfd8fa6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 04a024b
          Default Java 1.8.0_111
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/testReport/
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 12s trunk passed +1 compile 0m 25s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 31s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 35s trunk passed +1 javadoc 0m 17s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 22s the patch passed +1 javac 0m 22s the patch passed +1 checkstyle 0m 14s the patch passed +1 mvnsite 0m 28s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 45s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 8m 55s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 22m 13s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839259/mapreduce6801.001.patch JIRA Issue MAPREDUCE-6801 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux be224dfd8fa6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 04a024b Default Java 1.8.0_111 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6815/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          varun_saxena Varun Saxena added a comment - - edited

          Thanks Haibo Chen for the patch. This should handle all the cases except one, although that would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore app.waitForState(job, JobState.RUNNING) should be replaced by app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)

          I was able to simulate this case by putting a sleep in dispatcher.

          Show
          varun_saxena Varun Saxena added a comment - - edited Thanks Haibo Chen for the patch. This should handle all the cases except one, although that would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore app.waitForState(job, JobState.RUNNING) should be replaced by app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING) I was able to simulate this case by putting a sleep in dispatcher.
          Hide
          haibochen Haibo Chen added a comment -

          Thanks Varun Saxena for your reviews and for pointing out another case we need fix for! I'll incorporate your comments in the new patch shortly.

          Show
          haibochen Haibo Chen added a comment - Thanks Varun Saxena for your reviews and for pointing out another case we need fix for! I'll incorporate your comments in the new patch shortly.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 54s trunk passed
          +1 compile 0m 23s trunk passed
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 37s trunk passed
          +1 javadoc 0m 16s trunk passed
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 20s the patch passed
          +1 javac 0m 20s the patch passed
          +1 checkstyle 0m 13s the patch passed
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 40s the patch passed
          +1 javadoc 0m 13s the patch passed
          +1 unit 8m 52s hadoop-mapreduce-client-app in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          21m 37s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839461/mapreduce6801.002.patch
          JIRA Issue MAPREDUCE-6801
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9a5009f9ec63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b4f1971
          Default Java 1.8.0_111
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/testReport/
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 54s trunk passed +1 compile 0m 23s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 37s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed +1 checkstyle 0m 13s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 40s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 8m 52s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 21m 37s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839461/mapreduce6801.002.patch JIRA Issue MAPREDUCE-6801 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9a5009f9ec63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b4f1971 Default Java 1.8.0_111 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6818/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          varun_saxena Varun Saxena added a comment -

          +1
          Will commit it later today.

          Show
          varun_saxena Varun Saxena added a comment - +1 Will commit it later today.
          Hide
          varun_saxena Varun Saxena added a comment -

          Committed to trunk, branch-2.
          Thanks Haibo Chen for your contribution.

          Show
          varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2. Thanks Haibo Chen for your contribution.
          Hide
          haibochen Haibo Chen added a comment -

          Thanks Varun Saxena for your kind review!

          Show
          haibochen Haibo Chen added a comment - Thanks Varun Saxena for your kind review!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10863 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10863/)
          MAPREDUCE-6801. Fix flaky TestKill.testKillJob (Haibo Chen via Varun (varunsaxena: rev 7584fbf4cbafd34fac4b362cefe4e06cec16a2af)

          • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestKill.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10863 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10863/ ) MAPREDUCE-6801 . Fix flaky TestKill.testKillJob (Haibo Chen via Varun (varunsaxena: rev 7584fbf4cbafd34fac4b362cefe4e06cec16a2af) (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestKill.java
          Hide
          eepayne Eric Payne added a comment -

          Thanks Haibo Chen for the fix. I have backported this to branch-2.8.

          Show
          eepayne Eric Payne added a comment - Thanks Haibo Chen for the fix. I have backported this to branch-2.8.

            People

            • Assignee:
              haibochen Haibo Chen
              Reporter:
              haibochen Haibo Chen
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development