Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5195

RM intermittently crashed with NPE while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      While running gridmix experiments one time came across incident where RM went down with following exception

      2016-05-28 15:45:24,459 [ResourceManager Event Processor] FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1282)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:860)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1319)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:704)
              at java.lang.Thread.run(Thread.java:745)
      2016-05-28 15:45:24,460 [ApplicationMasterLauncher #49] INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1464449118385_0006_000001
      2016-05-28 15:45:24,460 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
      
      1. YARN-5195.01.patch
        1 kB
        sandflee
      2. YARN-5195.02.patch
        4 kB
        sandflee
      3. YARN-5195.03.patch
        4 kB
        sandflee
      4. YARN-5195-branch-2.7.001.patch
        5 kB
        Jonathan Hung
      5. YARN-5195-branch-2.8.001.patch
        4 kB
        Jonathan Hung
      6. YARN-5195-branch-2.8.001.patch
        4 kB
        Jason Lowe

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          Thanks, sandflee and Jonathan Hung! I committed this to branch-2.8, branch-2.8.2, and branch-2.7.

          Show
          jlowe Jason Lowe added a comment - Thanks, sandflee and Jonathan Hung ! I committed this to branch-2.8, branch-2.8.2, and branch-2.7.
          Hide
          jhung Jonathan Hung added a comment -

          Thanks Jason Lowe for verifying! (Also for the review/commit.)

          Show
          jhung Jonathan Hung added a comment - Thanks Jason Lowe for verifying! (Also for the review/commit.)
          Hide
          jlowe Jason Lowe added a comment -

          The unit test failures are similar to the branch-2.7 case – known issues with the Jenkins test environment on those branches. The unit tests pass locally for me with the patch applied.

          Committing.

          Show
          jlowe Jason Lowe added a comment - The unit test failures are similar to the branch-2.7 case – known issues with the Jenkins test environment on those branches. The unit tests pass locally for me with the patch applied. Committing.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 12m 14s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                branch-2.8 Compile Tests
          +1 mvninstall 9m 24s branch-2.8 passed
          +1 compile 0m 32s branch-2.8 passed
          +1 checkstyle 0m 19s branch-2.8 passed
          +1 mvnsite 0m 37s branch-2.8 passed
          +1 findbugs 1m 10s branch-2.8 passed
          +1 javadoc 0m 25s branch-2.8 passed
                Patch Compile Tests
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 34s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 19s the patch passed
          +1 javadoc 0m 21s the patch passed
                Other Tests
          -1 unit 75m 39s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          105m 33s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c2d96dd
          JIRA Issue YARN-5195
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888323/YARN-5195-branch-2.8.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 488fc670ea27 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / c0bb242
          Default Java 1.7.0_151
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/17570/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17570/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/17570/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 12m 14s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       branch-2.8 Compile Tests +1 mvninstall 9m 24s branch-2.8 passed +1 compile 0m 32s branch-2.8 passed +1 checkstyle 0m 19s branch-2.8 passed +1 mvnsite 0m 37s branch-2.8 passed +1 findbugs 1m 10s branch-2.8 passed +1 javadoc 0m 25s branch-2.8 passed       Patch Compile Tests +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 34s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 19s the patch passed +1 javadoc 0m 21s the patch passed       Other Tests -1 unit 75m 39s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 105m 33s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens Subsystem Report/Notes Docker Image:yetus/hadoop:c2d96dd JIRA Issue YARN-5195 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888323/YARN-5195-branch-2.8.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 488fc670ea27 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / c0bb242 Default Java 1.7.0_151 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/17570/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17570/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/17570/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Attaching the branch-2.8 patch again so the QA bot can comment on that as well.

          The unit test failures for the branch-2.7 run are known issues with the Jenkins setup on that branch.

          +1 for both patches. I'll commit these later today if the Jenkins results for 2.8 are OK as well.

          Show
          jlowe Jason Lowe added a comment - Attaching the branch-2.8 patch again so the QA bot can comment on that as well. The unit test failures for the branch-2.7 run are known issues with the Jenkins setup on that branch. +1 for both patches. I'll commit these later today if the Jenkins results for 2.8 are OK as well.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 10m 20s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                branch-2.7 Compile Tests
          +1 mvninstall 7m 47s branch-2.7 passed
          +1 compile 0m 24s branch-2.7 passed with JDK v1.8.0_144
          +1 compile 0m 28s branch-2.7 passed with JDK v1.7.0_151
          +1 checkstyle 0m 23s branch-2.7 passed
          +1 mvnsite 0m 36s branch-2.7 passed
          +1 findbugs 1m 4s branch-2.7 passed
          +1 javadoc 0m 18s branch-2.7 passed with JDK v1.8.0_144
          +1 javadoc 0m 23s branch-2.7 passed with JDK v1.7.0_151
                Patch Compile Tests
          +1 mvninstall 0m 27s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.8.0_144
          +1 javac 0m 23s the patch passed
          +1 compile 0m 26s the patch passed with JDK v1.7.0_151
          +1 javac 0m 26s the patch passed
          -0 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 441 unchanged - 2 fixed = 445 total (was 443)
          +1 mvnsite 0m 32s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 11s the patch passed
          +1 javadoc 0m 16s the patch passed with JDK v1.8.0_144
          +1 javadoc 0m 20s the patch passed with JDK v1.7.0_151
                Other Tests
          -1 unit 50m 51s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_151.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          129m 15s



          Reason Tests
          JDK v1.8.0_144 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_151 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:67e87c9
          JIRA Issue YARN-5195
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888104/YARN-5195-branch-2.7.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ad5c06098d86 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / ee7a94e
          Default Java 1.7.0_151
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_144 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_151
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17546/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/17546/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_151.txt
          JDK v1.7.0_151 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17546/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/17546/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 10m 20s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       branch-2.7 Compile Tests +1 mvninstall 7m 47s branch-2.7 passed +1 compile 0m 24s branch-2.7 passed with JDK v1.8.0_144 +1 compile 0m 28s branch-2.7 passed with JDK v1.7.0_151 +1 checkstyle 0m 23s branch-2.7 passed +1 mvnsite 0m 36s branch-2.7 passed +1 findbugs 1m 4s branch-2.7 passed +1 javadoc 0m 18s branch-2.7 passed with JDK v1.8.0_144 +1 javadoc 0m 23s branch-2.7 passed with JDK v1.7.0_151       Patch Compile Tests +1 mvninstall 0m 27s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_144 +1 javac 0m 23s the patch passed +1 compile 0m 26s the patch passed with JDK v1.7.0_151 +1 javac 0m 26s the patch passed -0 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 441 unchanged - 2 fixed = 445 total (was 443) +1 mvnsite 0m 32s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 11s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_144 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_151       Other Tests -1 unit 50m 51s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_151. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 129m 15s Reason Tests JDK v1.8.0_144 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_151 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:67e87c9 JIRA Issue YARN-5195 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12888104/YARN-5195-branch-2.7.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ad5c06098d86 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / ee7a94e Default Java 1.7.0_151 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_144 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_151 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17546/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17546/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_151.txt JDK v1.7.0_151 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17546/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/17546/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Moving this to Patch Available so the QA bot can comment.

          Show
          jlowe Jason Lowe added a comment - Moving this to Patch Available so the QA bot can comment.
          Hide
          jhung Jonathan Hung added a comment -

          Hi Sunil G/Wangda Tan, I attached branch-2.8 and branch-2.7 patches (implementation is slightly different due to YARN-4719). Can we commit this to these two branches? Thanks!

          FWIW, we encountered this on our cluster with async scheduling disabled (as the test case in the original patch suggests).

          Show
          jhung Jonathan Hung added a comment - Hi Sunil G / Wangda Tan , I attached branch-2.8 and branch-2.7 patches (implementation is slightly different due to YARN-4719 ). Can we commit this to these two branches? Thanks! FWIW, we encountered this on our cluster with async scheduling disabled (as the test case in the original patch suggests).
          Hide
          sandflee sandflee added a comment -

          Thanks Wangda Tan and Sunil G for reviewing and committing.

          Show
          sandflee sandflee added a comment - Thanks Wangda Tan and Sunil G for reviewing and committing.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #10161 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10161/)
          YARN-5195. RM intermittently crashed with NPE while handling (wangda: rev d62e121ffc0239e7feccc1e23ece92c5fac685f6)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10161 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10161/ ) YARN-5195 . RM intermittently crashed with NPE while handling (wangda: rev d62e121ffc0239e7feccc1e23ece92c5fac685f6) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          Hide
          leftnoteasy Wangda Tan added a comment -

          Committed to trunk and branch-2, thanks sandflee and reviews from Sunil G.

          Show
          leftnoteasy Wangda Tan added a comment - Committed to trunk and branch-2, thanks sandflee and reviews from Sunil G .
          Hide
          leftnoteasy Wangda Tan added a comment -

          +1, thanks sandflee, will commit soon if no objections.

          Show
          leftnoteasy Wangda Tan added a comment - +1, thanks sandflee , will commit soon if no objections.
          Hide
          sunilg Sunil G added a comment -

          Patch looks fine for me. Thanks sandflee.

          Show
          sunilg Sunil G added a comment - Patch looks fine for me. Thanks sandflee .
          Hide
          sunilg Sunil G added a comment -

          Yes sandflee. That make sense.

          Show
          sunilg Sunil G added a comment - Yes sandflee . That make sense.
          Hide
          sandflee sandflee added a comment -

          Thanks Sunil G , nodeTracker#remove are invoked at Scheduler#removeNode, Scheduler#updateNodeResource, they are synced with scheduler#allocateContainersToNode, it's safe for now.

          Show
          sandflee sandflee added a comment - Thanks Sunil G , nodeTracker#remove are invoked at Scheduler#removeNode, Scheduler#updateNodeResource, they are synced with scheduler#allocateContainersToNode, it's safe for now.
          Hide
          sunilg Sunil G added a comment -

          Hi sandflee
          Thanks for the patch. I have a doubt here.

          1 . all nodes copied from nodeTracker

          Since we copy all nodes from nodeTracker, we could loose one node any time during the allocation process. Currently the null check is added only at the start of allocateContainersToNode. So is it possible that we may loose node after this step too. Are we looking for lock here to avoid the problem, like an operating lock on node. Pls feel free to correct me if i understood the problem wrongly.

          Show
          sunilg Sunil G added a comment - Hi sandflee Thanks for the patch. I have a doubt here. 1 . all nodes copied from nodeTracker Since we copy all nodes from nodeTracker , we could loose one node any time during the allocation process. Currently the null check is added only at the start of allocateContainersToNode . So is it possible that we may loose node after this step too. Are we looking for lock here to avoid the problem, like an operating lock on node . Pls feel free to correct me if i understood the problem wrongly.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 25s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 50s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 33s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 0m 37s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 2s the patch passed
          +1 javadoc 0m 19s the patch passed
          -1 unit 36m 9s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          52m 4s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819522/YARN-5195.03.patch
          JIRA Issue YARN-5195
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ea77b5fb38ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 132deb4
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12453/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12453/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12453/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12453/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 50s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 33s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 2s the patch passed +1 javadoc 0m 19s the patch passed -1 unit 36m 9s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 52m 4s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819522/YARN-5195.03.patch JIRA Issue YARN-5195 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ea77b5fb38ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 132deb4 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/12453/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12453/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12453/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12453/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sandflee sandflee added a comment -

          update a patch to fix checkstyle warning, failed test could pass locally, seems not related.

          Show
          sandflee sandflee added a comment - update a patch to fix checkstyle warning, failed test could pass locally, seems not related.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 15s trunk passed
          +1 compile 0m 36s trunk passed
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 46s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 3s trunk passed
          +1 javadoc 0m 26s trunk passed
          +1 mvninstall 0m 37s the patch passed
          +1 compile 0m 34s the patch passed
          +1 javac 0m 34s the patch passed
          -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 291 unchanged - 0 fixed = 292 total (was 291)
          +1 mvnsite 0m 42s the patch passed
          +1 mvneclipse 0m 17s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 11s the patch passed
          +1 javadoc 0m 22s the patch passed
          -1 unit 33m 58s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          51m 19s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819430/YARN-5195.02.patch
          JIRA Issue YARN-5195
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 998c78924c17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ecff7d0
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12446/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12446/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 15s trunk passed +1 compile 0m 36s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 46s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 3s trunk passed +1 javadoc 0m 26s trunk passed +1 mvninstall 0m 37s the patch passed +1 compile 0m 34s the patch passed +1 javac 0m 34s the patch passed -1 checkstyle 0m 25s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 291 unchanged - 0 fixed = 292 total (was 291) +1 mvnsite 0m 42s the patch passed +1 mvneclipse 0m 17s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 11s the patch passed +1 javadoc 0m 22s the patch passed -1 unit 33m 58s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 51m 19s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819430/YARN-5195.02.patch JIRA Issue YARN-5195 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 998c78924c17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ecff7d0 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12446/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12446/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12446/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 9m 18s trunk passed
          +1 compile 0m 39s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 44s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 7s trunk passed
          +1 javadoc 0m 23s trunk passed
          +1 mvninstall 0m 36s the patch passed
          +1 compile 0m 36s the patch passed
          +1 javac 0m 36s the patch passed
          +1 checkstyle 0m 21s the patch passed
          +1 mvnsite 0m 41s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 14s the patch passed
          +1 javadoc 0m 21s the patch passed
          -1 unit 37m 31s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          55m 51s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819231/YARN-5195.01.patch
          JIRA Issue YARN-5195
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux bced71b4d78d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 521f343
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12431/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12431/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12431/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12431/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 9m 18s trunk passed +1 compile 0m 39s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 44s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 7s trunk passed +1 javadoc 0m 23s trunk passed +1 mvninstall 0m 36s the patch passed +1 compile 0m 36s the patch passed +1 javac 0m 36s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 0m 41s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 14s the patch passed +1 javadoc 0m 21s the patch passed -1 unit 37m 31s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 55m 51s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819231/YARN-5195.01.patch JIRA Issue YARN-5195 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bced71b4d78d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 521f343 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/12431/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12431/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12431/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12431/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sandflee sandflee added a comment -

          AsyncSchedulerThread will copy all node from nodeTracker before attemptScheduling on node. there is a race condition:
          1, all nodes copied from nodeTracker
          2, nodeA lost and removed from scheduler, all launched containers are cleaned
          3, app attempt completed and the container allocated (or reserved) on nodeA will refer to non-exist node.
          this is fixed in fairscheduler in YARN-3675, add a init patch and will add a test later

          Show
          sandflee sandflee added a comment - AsyncSchedulerThread will copy all node from nodeTracker before attemptScheduling on node. there is a race condition: 1, all nodes copied from nodeTracker 2, nodeA lost and removed from scheduler, all launched containers are cleaned 3, app attempt completed and the container allocated (or reserved) on nodeA will refer to non-exist node. this is fixed in fairscheduler in YARN-3675 , add a init patch and will add a test later
          Hide
          leftnoteasy Wangda Tan added a comment -

          I don't have bandwidth to do this now, please feel free to pick it up if you have time.

          Show
          leftnoteasy Wangda Tan added a comment - I don't have bandwidth to do this now, please feel free to pick it up if you have time.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Investigated this issue, this only happens when async scheduling enabled, container allocated to a node after the node removed from scheduler:

          Logs look like:

          2016-05-28 15:45:18,502 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1464449118385_0006_01_000324 of capacity <memory:2048, vCores:1> on host cn042-10.l42scl.hortonworks.com:49161, which currently has 0 containers, <memory:0, vCores:0> used and <memory:49152, vCores:12> available, release resources=true
          2016-05-28 15:45:18,503 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Removed node node-1:49161 clusterResource: <memory:442368, vCores:108>
          2016-05-28 15:45:18,526 [Thread-12] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1464449118385_0006_01_000382 of capacity <memory:2048, vCores:1> on host node-1:49161, which has 1 containers, <memory:2048, vCores:1> used and <memory:47104, vCores:11> available after allocation
          

          Add additional lock protection to async scheduling thread could prevent this happen.

          Show
          leftnoteasy Wangda Tan added a comment - Investigated this issue, this only happens when async scheduling enabled, container allocated to a node after the node removed from scheduler: Logs look like: 2016-05-28 15:45:18,502 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1464449118385_0006_01_000324 of capacity <memory:2048, vCores:1> on host cn042-10.l42scl.hortonworks.com:49161, which currently has 0 containers, <memory:0, vCores:0> used and <memory:49152, vCores:12> available, release resources= true 2016-05-28 15:45:18,503 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Removed node node-1:49161 clusterResource: <memory:442368, vCores:108> 2016-05-28 15:45:18,526 [ Thread -12] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1464449118385_0006_01_000382 of capacity <memory:2048, vCores:1> on host node-1:49161, which has 1 containers, <memory:2048, vCores:1> used and <memory:47104, vCores:11> available after allocation Add additional lock protection to async scheduling thread could prevent this happen.

            People

            • Assignee:
              sandflee sandflee
              Reporter:
              karams Karam Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development