Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3675

FairScheduler: RM quits when node removal races with continousscheduling on the same node

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below.

      12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
      
      Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
      java.lang.NullPointerException
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
      	at java.lang.Thread.run(Thread.java:745)
      12:28:53.783 AM	 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye..
      
      1. YARN-3675.001.patch
        6 kB
        Anubhav Dhoot
      2. YARN-3675.002.patch
        6 kB
        Anubhav Dhoot
      3. YARN-3675.003.patch
        4 kB
        Anubhav Dhoot

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/935/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/935/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7885 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7885/)
          YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7885 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7885/ ) YARN-3675 . FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt
          Hide
          kasha Karthik Kambatla added a comment -

          Just committed this to trunk, branch-2, and branch-2.7.

          Thanks Anubhav Dhoot for reporting and fixing this, Arun for the review.

          Show
          kasha Karthik Kambatla added a comment - Just committed this to trunk, branch-2, and branch-2.7. Thanks Anubhav Dhoot for reporting and fixing this, Arun for the review.
          Hide
          kasha Karthik Kambatla added a comment -

          +1. Checking this in.

          Just want to note that we rely on the lock on FairScheduler to ensure a node doesn't get removed during attemptScheduling. I feel we are getting to the point where we should invest time in making these locks finer grained, otherwise we might end up in the MR1 world.

          Show
          kasha Karthik Kambatla added a comment - +1. Checking this in. Just want to note that we rely on the lock on FairScheduler to ensure a node doesn't get removed during attemptScheduling. I feel we are getting to the point where we should invest time in making these locks finer grained, otherwise we might end up in the MR1 world.
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the patch Anubhav Dhoot,
          +1, LGTM

          Show
          asuresh Arun Suresh added a comment - Thanks for the patch Anubhav Dhoot , +1, LGTM
          Hide
          adhoot Anubhav Dhoot added a comment -

          Failure does not repro locally for me and seems unrelated

          Show
          adhoot Anubhav Dhoot added a comment - Failure does not repro locally for me and seems unrelated
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 34s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 31s There were no new javac warning messages.
          +1 javadoc 9m 35s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 46s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 50m 4s Tests failed in hadoop-yarn-server-resourcemanager.
              86m 17s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734207/YARN-3675.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 4aa730c
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8030/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8030/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 34s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 46s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 50m 4s Tests failed in hadoop-yarn-server-resourcemanager.     86m 17s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734207/YARN-3675.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 4aa730c hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8030/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8030/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8030/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Removed spurious changes and changed visibility of attemptScheduling

          Show
          adhoot Anubhav Dhoot added a comment - Removed spurious changes and changed visibility of attemptScheduling
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 44s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 41s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 47s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 51s Tests passed in hadoop-yarn-server-resourcemanager.
              87m 29s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734156/YARN-3675.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 4aa730c
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8025/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8025/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8025/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 44s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 47s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 51s Tests passed in hadoop-yarn-server-resourcemanager.     87m 29s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734156/YARN-3675.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 4aa730c hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8025/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8025/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8025/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Fixed checkstyle issue

          Show
          adhoot Anubhav Dhoot added a comment - Fixed checkstyle issue
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 42s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 36s There were no new javac warning messages.
          +1 javadoc 9m 33s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 47s The applied patch generated 1 new checkstyle issues (total was 74, now 75).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 31s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 14s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 8s Tests passed in hadoop-yarn-server-resourcemanager.
              86m 30s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734074/YARN-3675.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / ce53c8e
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8018/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8018/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 42s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 36s There were no new javac warning messages. +1 javadoc 9m 33s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 47s The applied patch generated 1 new checkstyle issues (total was 74, now 75). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 31s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 14s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 8s Tests passed in hadoop-yarn-server-resourcemanager.     86m 30s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734074/YARN-3675.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / ce53c8e checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8018/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8018/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8018/console This message was automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          This fixes the issue where scheduling can happen after the node has been removed. Because of this when the application is removed, its will clean up its reserved and completed containers. And at that time it will try to call a method on the FSSchedulerNode which is null. Here is the trace of the same instance as above where it shows the scheduling happening just after the node is removed. Looking at continuousSchedulingAttempt we can get the reference to the node before we take scheduler lock when calling attemptScheduling.

          hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node <nmhostname>:8041 
          hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e25_1431107530707_159950_01_000021 of capacity <memory:2048, vCores:1> on host <nmhostname>:8041, which has 1 containers, <memory:2048, vCores:1> used an
          hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,796 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=<nmhostname> app_id=application_1431107530707_159852
          
          Show
          adhoot Anubhav Dhoot added a comment - This fixes the issue where scheduling can happen after the node has been removed. Because of this when the application is removed, its will clean up its reserved and completed containers. And at that time it will try to call a method on the FSSchedulerNode which is null. Here is the trace of the same instance as above where it shows the scheduling happening just after the node is removed. Looking at continuousSchedulingAttempt we can get the reference to the node before we take scheduler lock when calling attemptScheduling. hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node <nmhostname>:8041 hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,793 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e25_1431107530707_159950_01_000021 of capacity <memory:2048, vCores:1> on host <nmhostname>:8041, which has 1 containers, <memory:2048, vCores:1> used an hadoop-YARN-1-RESOURCEMANAGER-hostname.log.out:2015-05-11 00:27:42,796 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=<nmhostname> app_id=application_1431107530707_159852
          Hide
          adhoot Anubhav Dhoot added a comment -

          Attached a fix that skips scheduling on nodes that are just removed.

          Show
          adhoot Anubhav Dhoot added a comment - Attached a fix that skips scheduling on nodes that are just removed.

            People

            • Assignee:
              adhoot Anubhav Dhoot
              Reporter:
              adhoot Anubhav Dhoot
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development