Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3973

Recent changes to application priority management break reservation system from YARN-1051

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Recent changes in trunk (I think YARN-2003) produce NPE for reservation system when application is submitted to a ReservationQueue.

      1. YARN-3973.1.patch
        1 kB
        Carlo Curino
      2. YARN-3973.patch
        1 kB
        Carlo Curino

        Activity

        Hide
        curino Carlo Curino added a comment -

        Ishai Menache has just reported to me a NPE when running the reservation system in current trunk (part of testing of YARN-3656). Thanks Ishai for reporting this.

        Wangda Tan, I think this is due to YARN-2003 changes. In particular we get and NPE when running: CapacityScheduler.getDefaultPriorityForQueue()
        due to the ReservationQueue (inheriting from LeafQueue) the get getDefaultApplicationPriority() which returns null for queues that are not in the config file,
        such as the dynamic queues of the reservation system.

        A possible fix is to override getDefaultApplicationPriority() in ReservationQueue to return some default priority value:

          @Override
          public Priority getDefaultApplicationPriority() {
            return Priority.newInstance(0);
          }
        

        But I wonder whether there is anything more principled, Wangda Tan any advise?.

        Show
        curino Carlo Curino added a comment - Ishai Menache has just reported to me a NPE when running the reservation system in current trunk (part of testing of YARN-3656 ). Thanks Ishai for reporting this. Wangda Tan , I think this is due to YARN-2003 changes. In particular we get and NPE when running: CapacityScheduler.getDefaultPriorityForQueue() due to the ReservationQueue (inheriting from LeafQueue) the get getDefaultApplicationPriority() which returns null for queues that are not in the config file, such as the dynamic queues of the reservation system. A possible fix is to override getDefaultApplicationPriority() in ReservationQueue to return some default priority value: @Override public Priority getDefaultApplicationPriority() { return Priority.newInstance(0); } But I wonder whether there is anything more principled, Wangda Tan any advise?.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Carlo Curino reporting and analyzing this issue, I think it should be safe to add a default implementation returns a non-null priority at AbstractCSQueue.

        Sunil G, thoughts?

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Carlo Curino reporting and analyzing this issue, I think it should be safe to add a default implementation returns a non-null priority at AbstractCSQueue. Sunil G , thoughts?
        Hide
        curino Carlo Curino added a comment -

        Wangda Tan what you propose makes more sense. From a quick test on actual cluster, it looks like it works.

        The problem during application submission is in fact not addressed by overriding just in ReservationQueue,
        ParentQueue must also provide an implementation (due to the reservation mechanics for app submission),
        thus doing this in AbstractCSQueue seems the right way.

        I uploaded a simple patch for this, but please double check this carefully (I am not familiar with priority mechanics).

        Show
        curino Carlo Curino added a comment - Wangda Tan what you propose makes more sense. From a quick test on actual cluster, it looks like it works. The problem during application submission is in fact not addressed by overriding just in ReservationQueue, ParentQueue must also provide an implementation (due to the reservation mechanics for app submission), thus doing this in AbstractCSQueue seems the right way. I uploaded a simple patch for this, but please double check this carefully (I am not familiar with priority mechanics).
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 16m 5s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 41s There were no new javac warning messages.
        +1 javadoc 9m 35s There were no new javadoc warning messages.
        +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 0m 47s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 20s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 1m 27s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 yarn tests 51m 58s Tests passed in hadoop-yarn-server-resourcemanager.
            89m 53s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12747091/YARN-3973.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / d19d187
        hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8655/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8655/testReport/
        Java 1.7.0_55
        uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8655/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 5s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 47s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 20s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 27s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 51m 58s Tests passed in hadoop-yarn-server-resourcemanager.     89m 53s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747091/YARN-3973.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / d19d187 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8655/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8655/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8655/console This message was automatically generated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Carlo Curino, looked at code again. I think to solve the NPE issue, it's better to check at CapacityScheduler#getDefaultPriorityForQueue, when {queue#getDefaultApplicationPriority}} is null, it should return CapacitySchedulerConfiguration.DEFAULT_CONFIGURATION_APPLICATION_PRIORITY instead.

        Adding a default implementation to AbstractCSQueue can not leverage defined DEFAULT_APPLICATION_PRIORITY.

        Show
        leftnoteasy Wangda Tan added a comment - Carlo Curino , looked at code again. I think to solve the NPE issue, it's better to check at CapacityScheduler#getDefaultPriorityForQueue , when {queue#getDefaultApplicationPriority}} is null, it should return CapacitySchedulerConfiguration.DEFAULT_CONFIGURATION_APPLICATION_PRIORITY instead. Adding a default implementation to AbstractCSQueue can not leverage defined DEFAULT_APPLICATION_PRIORITY.
        Hide
        curino Carlo Curino added a comment -

        Wangda Tan, this makes sense. I am testing in a cluster now.

        Show
        curino Carlo Curino added a comment - Wangda Tan , this makes sense. I am testing in a cluster now.
        Hide
        curino Carlo Curino added a comment -

        Wangda Tan, it seems to work fine on a test cluster, however please double check this before committing it.

        Show
        curino Carlo Curino added a comment - Wangda Tan , it seems to work fine on a test cluster, however please double check this before committing it.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Carlo Curino.
        +1 to latest patch. Sunil G, could you also take a look at it?

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Carlo Curino . +1 to latest patch. Sunil G , could you also take a look at it?
        Hide
        sunilg Sunil G added a comment -

        Thanks Carlo Curino for reporting this and thanks Wangda Tan for analysis.
        Latest patch looks good to me. It will solve the problem.

        Show
        sunilg Sunil G added a comment - Thanks Carlo Curino for reporting this and thanks Wangda Tan for analysis. Latest patch looks good to me. It will solve the problem.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 16m 57s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 8m 46s There were no new javac warning messages.
        +1 javadoc 9m 47s There were no new javadoc warning messages.
        +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 0m 57s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 23s mvn install still works.
        +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
        +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        -1 yarn tests 53m 20s Tests failed in hadoop-yarn-server-resourcemanager.
            93m 36s  



        Reason Tests
        Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12747112/YARN-3973.1.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 83fe34a
        hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8659/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8659/testReport/
        Java 1.7.0_55
        uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8659/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 57s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 46s There were no new javac warning messages. +1 javadoc 9m 47s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 57s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 53m 20s Tests failed in hadoop-yarn-server-resourcemanager.     93m 36s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747112/YARN-3973.1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 83fe34a hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8659/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8659/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8659/console This message was automatically generated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Committed to trunk/branch-2, thanks Carlo Curino and review from Sunil G!

        Show
        leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2, thanks Carlo Curino and review from Sunil G !
        Hide
        leftnoteasy Wangda Tan added a comment -

        And I verified tests can be passed locally, not caused by this patch.

        Show
        leftnoteasy Wangda Tan added a comment - And I verified tests can be passed locally, not caused by this patch.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8220 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8220/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8220 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8220/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Hide
        curino Carlo Curino added a comment -

        Thanks Wangda Tan and Sunil G for fast and insightful reviewing and commit.

        Show
        curino Carlo Curino added a comment - Thanks Wangda Tan and Sunil G for fast and insightful reviewing and commit.
        Hide
        sunilg Sunil G added a comment -

        Thank you Wangda Tan, test cases are passing locally. I will have a look on this random failure separately.

        Show
        sunilg Sunil G added a comment - Thank you Wangda Tan , test cases are passing locally. I will have a look on this random failure separately.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/997/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/997/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/)
        YARN-3973. Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/ ) YARN-3973 . Recent changes to application priority management break reservation system from YARN-1051 (Carlo Curino via wangda) (wangda: rev a3bd7b4a59b3664273dc424f240356838213d4e7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java

          People

          • Assignee:
            curino Carlo Curino
            Reporter:
            curino Carlo Curino
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development