Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3655

FairScheduler: potential livelock due to maxAMShare limitation and container reservation

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: fairscheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      FairScheduler: potential livelock due to maxAMShare limitation and container reservation.
      If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node.
      The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen.
      The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock.

          // Check the AM resource usage for the leaf queue
          if (!isAmRunning() && !getUnmanagedAM()) {
            List<ResourceRequest> ask = appSchedulingInfo.getAllResourceRequests();
            if (ask.isEmpty() || !getQueue().canRunAppAM(
                ask.get(0).getCapability())) {
              if (LOG.isDebugEnabled()) {
                LOG.debug("Skipping allocation because maxAMShare limit would " +
                    "be exceeded");
              }
              return Resources.none();
            }
          }
      

      To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application.

      1. YARN-3655.004.patch
        27 kB
        zhihai xu
      2. YARN-3655.003.patch
        28 kB
        zhihai xu
      3. YARN-3655.002.patch
        16 kB
        zhihai xu
      4. YARN-3655.001.patch
        9 kB
        zhihai xu
      5. YARN-3655.000.patch
        1 kB
        zhihai xu

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #220 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/220/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #220 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/220/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2168 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2168/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2168 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2168/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #211 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/211/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #211 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/211/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2150 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2150/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2150 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2150/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #952 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/952/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #952 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/952/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #222 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/222/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #222 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/222/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          zxu zhihai xu added a comment -

          thanks Arun Suresh for the review! thanks Karthik Kambatla for reviewing and committing the patch.

          Show
          zxu zhihai xu added a comment - thanks Arun Suresh for the review! thanks Karthik Kambatla for reviewing and committing the patch.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7984 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7984/)
          YARN-3655. FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7984 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7984/ ) YARN-3655 . FairScheduler: potential livelock due to maxAMShare limitation and container reservation. (Zhihai Xu via kasha) (kasha: rev bd69ea408f8fdd8293836ce1089fe9b01616f2f7) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          kasha Karthik Kambatla added a comment -

          Zhihai - thanks for fixing this critical issue and patience through the reviews.

          Just committed this to trunk and branch-2.

          Show
          kasha Karthik Kambatla added a comment - Zhihai - thanks for fixing this critical issue and patience through the reviews. Just committed this to trunk and branch-2.
          Hide
          kasha Karthik Kambatla added a comment -

          +1, checking this in.

          Show
          kasha Karthik Kambatla added a comment - +1, checking this in.
          Hide
          zxu zhihai xu added a comment -

          Updated the patch to fix the checkstyle issue. The latest patch YARN-3655.004.patch passed the Jenkins test.

          Show
          zxu zhihai xu added a comment - Updated the patch to fix the checkstyle issue. The latest patch YARN-3655 .004.patch passed the Jenkins test.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 57s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 36s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 45s There were no new checkstyle issues.
          +1 whitespace 0m 3s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 13s Tests passed in hadoop-yarn-server-resourcemanager.
              88m 11s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738203/YARN-3655.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 71de367
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8209/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8209/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8209/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 57s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 36s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 45s There were no new checkstyle issues. +1 whitespace 0m 3s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 13s Tests passed in hadoop-yarn-server-resourcemanager.     88m 11s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738203/YARN-3655.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 71de367 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8209/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8209/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8209/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 15m 2s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 35s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 23s There were no new checkstyle issues.
          +1 whitespace 0m 4s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 yarn tests 50m 17s Tests failed in hadoop-yarn-server-resourcemanager.
              86m 55s  



          Reason Tests
          Failed unit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738190/YARN-3655.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 71de367
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8208/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8208/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8208/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 15m 2s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 35s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 23s There were no new checkstyle issues. +1 whitespace 0m 4s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 26s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 yarn tests 50m 17s Tests failed in hadoop-yarn-server-resourcemanager.     86m 55s   Reason Tests Failed unit tests hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738190/YARN-3655.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 71de367 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8208/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8208/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8208/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 51s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 30s There were no new javac warning messages.
          +1 javadoc 9m 35s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 122, now 119).
          +1 whitespace 0m 5s The patch has no lines that end in whitespace.
          +1 install 1m 32s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 12s Tests passed in hadoop-yarn-server-resourcemanager.
              87m 55s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738145/YARN-3655.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / b3ffa87
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8202/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8202/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8202/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8202/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 51s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 30s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 122, now 119). +1 whitespace 0m 5s The patch has no lines that end in whitespace. +1 install 1m 32s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 24s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 12s Tests passed in hadoop-yarn-server-resourcemanager.     87m 55s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738145/YARN-3655.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / b3ffa87 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8202/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8202/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8202/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8202/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          Karthik Kambatla, thanks for the thorough review, I uploaded a new patch YARN-3655.004.patch which addressed your first comment.
          And I created two follow up JIRAs YARN-3776 and YARN-3777 which addressed your second and third comments. Please review it. Many thanks.

          Show
          zxu zhihai xu added a comment - Karthik Kambatla , thanks for the thorough review, I uploaded a new patch YARN-3655 .004.patch which addressed your first comment. And I created two follow up JIRAs YARN-3776 and YARN-3777 which addressed your second and third comments. Please review it. Many thanks.
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks for the clarifications, Zhihai. The latest patch looks mostly good, nice test. Few nit picks before we get this in:

          1. In hasContainerForNode, the patch has some spurious changes. Also, would be nice to add a comment for the newly added check.
          2. File a follow-up JIRA to separate out the code paths for assigning a reserved container and a non-reserved container.
          3. File a follow-up JIRA to move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations
          Show
          kasha Karthik Kambatla added a comment - Thanks for the clarifications, Zhihai. The latest patch looks mostly good, nice test. Few nit picks before we get this in: In hasContainerForNode, the patch has some spurious changes. Also, would be nice to add a comment for the newly added check. File a follow-up JIRA to separate out the code paths for assigning a reserved container and a non-reserved container. File a follow-up JIRA to move all reservation-related tests from TestFairScheduler to TestFairSchedulerReservations
          Hide
          zxu zhihai xu added a comment -

          Hi Karthik Kambatla, thanks for the review.

          Is it possible to avoid the checks before the call, and do all the checks in the call. The reasoning behind this is to have all reservation-related code in as few places as possible. If this is not possible, we can leave it as the patch has it now.

          IMHO, it is not possible because FSAppAttempt#reserve will only be called from assignContainer(node), if we move all the condition checks into FSAppAttempt#reserve, it may return early, which will cause failing to reserve or allocate container for other priorities, also since assignReservedContainer won't call FSAppAttempt#reserve, we still need keep isValidReservation check in assignReservedContainer.

          Instead of adding the check to assignContainer(node) can we add it to assignContainer(node, request, nodeType, reserved)?

          IMHO, It will have problem, if we add it to assignContainer(node, request, nodeType, reserved), then getAllowedLocalityLevelByTime/getAllowedLocalityLevel will be called before the check instead of after the check, which will change the Scheduling behavior, also it will affect the performance(late check will increase CPU usage).

          Show
          zxu zhihai xu added a comment - Hi Karthik Kambatla , thanks for the review. Is it possible to avoid the checks before the call, and do all the checks in the call. The reasoning behind this is to have all reservation-related code in as few places as possible. If this is not possible, we can leave it as the patch has it now. IMHO, it is not possible because FSAppAttempt#reserve will only be called from assignContainer(node) , if we move all the condition checks into FSAppAttempt#reserve , it may return early, which will cause failing to reserve or allocate container for other priorities, also since assignReservedContainer won't call FSAppAttempt#reserve , we still need keep isValidReservation check in assignReservedContainer . Instead of adding the check to assignContainer(node) can we add it to assignContainer(node, request, nodeType, reserved)? IMHO, It will have problem, if we add it to assignContainer(node, request, nodeType, reserved) , then getAllowedLocalityLevelByTime / getAllowedLocalityLevel will be called before the check instead of after the check, which will change the Scheduling behavior, also it will affect the performance(late check will increase CPU usage).
          Hide
          kasha Karthik Kambatla added a comment -

          IMHO, It is not good to add if (isValidReservation) check in FSAppAttempt#reserve because all the conditions checked in isValidReservation are already checked before we call FSAppAttempt#reserve, it will be duplicate code which will affect the performance.

          Is it possible to avoid the checks before the call, and do all the checks in the call. The reasoning behind this is to have all reservation-related code in as few places as possible. If this is not possible, we can leave it as the patch has it now.

          While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place.

          Instead of adding the check to assignContainer(node) can we add it to assignContainer(node, request, nodeType, reserved)?

          Show
          kasha Karthik Kambatla added a comment - IMHO, It is not good to add if (isValidReservation) check in FSAppAttempt#reserve because all the conditions checked in isValidReservation are already checked before we call FSAppAttempt#reserve, it will be duplicate code which will affect the performance. Is it possible to avoid the checks before the call, and do all the checks in the call. The reasoning behind this is to have all reservation-related code in as few places as possible. If this is not possible, we can leave it as the patch has it now. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Instead of adding the check to assignContainer(node) can we add it to assignContainer(node, request, nodeType, reserved)?
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 35s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 31s There were no new javac warning messages.
          +1 javadoc 9m 30s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 54s There were no new checkstyle issues.
          +1 whitespace 0m 5s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 14s Tests passed in hadoop-yarn-server-resourcemanager.
              86m 38s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12735229/YARN-3655.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / ada233b
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8077/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8077/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8077/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 35s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 30s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 54s There were no new checkstyle issues. +1 whitespace 0m 5s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 14s Tests passed in hadoop-yarn-server-resourcemanager.     86m 38s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12735229/YARN-3655.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / ada233b hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8077/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8077/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8077/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 31s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 123, now 120).
          +1 whitespace 0m 3s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
          +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 10s Tests passed in hadoop-yarn-server-resourcemanager.
              86m 31s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12735189/YARN-3655.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / ada233b
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8072/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8072/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8072/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8072/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 31s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 46s The applied patch generated 1 new checkstyle issues (total was 123, now 120). +1 whitespace 0m 3s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 1m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 10s Tests passed in hadoop-yarn-server-resourcemanager.     86m 31s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12735189/YARN-3655.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / ada233b checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8072/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8072/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8072/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8072/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          Hi Karthik Kambatla, thanks for the review.

          1. okToUnreserve

          fixed in the new patch YARN-3655.003.patch

          2. Add an if (isValidReservation) check in FSAppAttempt#reserve so all the reservation logic stays in one place?

          IMHO, It is not good to add if (isValidReservation) check in FSAppAttempt#reserve because all the conditions checked in isValidReservation are already checked before we call FSAppAttempt#reserve, it will be duplicate code which will affect the performance.

          3.In FSAppAttempt#assignContainer(node, request, nodeType, reserved)...

          fixed in the new patch YARN-3655.003.patch, In order to remove fitsInMaxShare check, I merged fitsInMaxShare check into hasContainerForNode, which also make the code cleaner.

          4. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Also, assignReservedContainer could also lead to a reservation?

          It looks like assignReservedContainer won't lead to a reservation(FSAppAttempt#reserve), assignReservedContainer won't call FSAppAttempt#reserve because FSAppAttempt#reserve will only be called when the node Available Resource is smaller than the requested/reserved resource. assignReservedContainer will only call assignContainer when the node Available Resource is no less than the reserved resource. So only FSAppAttempt#assignContainer(node) can lead to a reservation when the node Available Resource is smaller than the requested resource.

          5. Instead of calling okToUnreserve/!isValidReservation in FairScheduler#attemptScheduling...

          fixed in the new patch YARN-3655.003.patch

          6. Looks like assign-multiple is broken with reserved-containers. The while-loop for assign-multiple should look at both reserved and un-reserved containers assigned. Can we file a follow-up JIRA to fix this?

          I suppose you mean assign-multiple is broken after assignReservedContainer turns the reservation into an allocation.
          Yes, I created YARN-3710 to fix this issue.

          Oh, and I found it hard to understand the test....

          fixed in the new patch YARN-3655.003.patch, please review it.

          Show
          zxu zhihai xu added a comment - Hi Karthik Kambatla , thanks for the review. 1. okToUnreserve fixed in the new patch YARN-3655 .003.patch 2. Add an if (isValidReservation) check in FSAppAttempt#reserve so all the reservation logic stays in one place? IMHO, It is not good to add if (isValidReservation) check in FSAppAttempt#reserve because all the conditions checked in isValidReservation are already checked before we call FSAppAttempt#reserve, it will be duplicate code which will affect the performance. 3.In FSAppAttempt#assignContainer(node, request, nodeType, reserved) ... fixed in the new patch YARN-3655 .003.patch, In order to remove fitsInMaxShare check, I merged fitsInMaxShare check into hasContainerForNode , which also make the code cleaner. 4. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Also, assignReservedContainer could also lead to a reservation? It looks like assignReservedContainer won't lead to a reservation( FSAppAttempt#reserve ), assignReservedContainer won't call FSAppAttempt#reserve because FSAppAttempt#reserve will only be called when the node Available Resource is smaller than the requested/reserved resource. assignReservedContainer will only call assignContainer when the node Available Resource is no less than the reserved resource. So only FSAppAttempt#assignContainer(node) can lead to a reservation when the node Available Resource is smaller than the requested resource. 5. Instead of calling okToUnreserve/!isValidReservation in FairScheduler#attemptScheduling... fixed in the new patch YARN-3655 .003.patch 6. Looks like assign-multiple is broken with reserved-containers. The while-loop for assign-multiple should look at both reserved and un-reserved containers assigned. Can we file a follow-up JIRA to fix this? I suppose you mean assign-multiple is broken after assignReservedContainer turns the reservation into an allocation. Yes, I created YARN-3710 to fix this issue. Oh, and I found it hard to understand the test.... fixed in the new patch YARN-3655 .003.patch, please review it.
          Hide
          kasha Karthik Kambatla added a comment -

          Oh, and I found it hard to understand the test. Can we add some documentation to clarify what the test is doing? We should essentially test the following:

          1. Container gets reserved when not over maxAMShare
          2. Container doesn't get reserved when over maxAMShare
          3. If the maxAMShare were to go down due to fairshare going down, container gets unreserved.
          Show
          kasha Karthik Kambatla added a comment - Oh, and I found it hard to understand the test. Can we add some documentation to clarify what the test is doing? We should essentially test the following: Container gets reserved when not over maxAMShare Container doesn't get reserved when over maxAMShare If the maxAMShare were to go down due to fairshare going down, container gets unreserved.
          Hide
          kasha Karthik Kambatla added a comment -

          Comments on the patch:

          1. okToUnreserve
            1. It was a little hard to wrap my head around. Can we negate it and call it isValidReservation(FSSchedulerNode)?
            2. Can we get rid of the if-else and have a simple return hasContainerForNode && fitsInMaxShare && !isOverAMShareLimit?
          2. Add an if (isValidReservation) check in FSAppAttempt#reserve so all the reservation logic stays in one place?
          3. In FSAppAttempt#assignContainer(node, request, nodeType, reserved),
            1. We can get rid of the fitsInMaxShare check immediately preceding the call to reserve.
            2. Given if (fitsIn(capability, available))-block ends in return, we don't need to put the continuation in else.
          4. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Also, assignReservedContainer could also lead to a reservation?
          5. Instead of calling okToUnreserve/!isValidReservation in FairScheduler#attemptScheduling, we should likely add it as the first check in FSAppAttempt#assignReservedContainer.
          6. Looks like assign-multiple is broken with reserved-containers. The while-loop for assign-multiple should look at both reserved and un-reserved containers assigned. Can we file a follow-up JIRA to fix this?
          Show
          kasha Karthik Kambatla added a comment - Comments on the patch: okToUnreserve It was a little hard to wrap my head around. Can we negate it and call it isValidReservation(FSSchedulerNode) ? Can we get rid of the if-else and have a simple return hasContainerForNode && fitsInMaxShare && !isOverAMShareLimit ? Add an if (isValidReservation) check in FSAppAttempt#reserve so all the reservation logic stays in one place? In FSAppAttempt#assignContainer(node, request, nodeType, reserved) , We can get rid of the fitsInMaxShare check immediately preceding the call to reserve . Given if (fitsIn(capability, available)) -block ends in return, we don't need to put the continuation in else. While adding this check in FSAppAttempt#assignContainer(node) might work in practice, it somehow feels out of place. Also, assignReservedContainer could also lead to a reservation? Instead of calling okToUnreserve / !isValidReservation in FairScheduler#attemptScheduling , we should likely add it as the first check in FSAppAttempt#assignReservedContainer . Looks like assign-multiple is broken with reserved-containers. The while-loop for assign-multiple should look at both reserved and un-reserved containers assigned. Can we file a follow-up JIRA to fix this?
          Hide
          kasha Karthik Kambatla added a comment -

          I would like to take a look at the patch as well.

          Show
          kasha Karthik Kambatla added a comment - I would like to take a look at the patch as well.
          Hide
          asuresh Arun Suresh added a comment -

          makes sense...
          +1 from me.. will commit, unless Karthik Kambatla has any comments

          Show
          asuresh Arun Suresh added a comment - makes sense... +1 from me.. will commit, unless Karthik Kambatla has any comments
          Hide
          zxu zhihai xu added a comment -

          Arun Suresh, thanks for the review. As Karthik Kambatla suggested, I did some refactor for the code to combine all the unreserve conditions check in okToUnreserve. maxAMShare check is only used for AM container reservation and there are none AM container reservations which need !hasContainerForNode check.

          Show
          zxu zhihai xu added a comment - Arun Suresh , thanks for the review. As Karthik Kambatla suggested, I did some refactor for the code to combine all the unreserve conditions check in okToUnreserve . maxAMShare check is only used for AM container reservation and there are none AM container reservations which need !hasContainerForNode check.
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the update zhihai xu. I have only one nit :
          In the okToUnreserve method, do we also need to do the !hasContainerForNode check ?? I assume the patch was meant to unreserve those containers which can fit in a node, but we put a limit on this if it exceeds maxAMShare..

          Show
          asuresh Arun Suresh added a comment - Thanks for the update zhihai xu . I have only one nit : In the okToUnreserve method, do we also need to do the !hasContainerForNode check ?? I assume the patch was meant to unreserve those containers which can fit in a node, but we put a limit on this if it exceeds maxAMShare..
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 58s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 52s There were no new javac warning messages.
          +1 javadoc 9m 47s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 48s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 50m 22s Tests passed in hadoop-yarn-server-resourcemanager.
              87m 39s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734328/YARN-3655.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / fb6b38d
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8037/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8037/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8037/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 58s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 52s There were no new javac warning messages. +1 javadoc 9m 47s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 48s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 16s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 50m 22s Tests passed in hadoop-yarn-server-resourcemanager.     87m 39s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734328/YARN-3655.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / fb6b38d hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8037/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8037/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8037/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          thanks Arun Suresh for the review. I think the flip-flop won't happen.

          At some time T2, the next allocation event (after all nodes have sent heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 2GB is made on each node for appX.

          The above reservation won't succeed because maxAMShare limitation.
          If it succeeded, then the reservation for appX won't be removed.

          thanks Karthik Kambatla for your review. these are great suggestions.
          I made the change based on your suggestions. Also I fixed fitsInMaxShare issue in this JIRA instead of creating a follow-up JIRA.
          I also did some optimizations to remove some duplicate logic.
          I find hasContainerForNode already covered getTotalRequiredResources.
          If we check hasContainerForNode, then we don't check getTotalRequiredResources.
          So I remove getTotalRequiredResources check in assignReservedContainer and assignContainer.
          Also because okToUnreserve checked hasContainerForNode, we don't need to check it again for reserved container in assignContainer.
          I uploaded a new patch YARN-3655.002.patch with above change.

          Show
          zxu zhihai xu added a comment - thanks Arun Suresh for the review. I think the flip-flop won't happen. At some time T2, the next allocation event (after all nodes have sent heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 2GB is made on each node for appX. The above reservation won't succeed because maxAMShare limitation. If it succeeded, then the reservation for appX won't be removed. thanks Karthik Kambatla for your review. these are great suggestions. I made the change based on your suggestions. Also I fixed fitsInMaxShare issue in this JIRA instead of creating a follow-up JIRA. I also did some optimizations to remove some duplicate logic. I find hasContainerForNode already covered getTotalRequiredResources. If we check hasContainerForNode, then we don't check getTotalRequiredResources. So I remove getTotalRequiredResources check in assignReservedContainer and assignContainer. Also because okToUnreserve checked hasContainerForNode, we don't need to check it again for reserved container in assignContainer. I uploaded a new patch YARN-3655 .002.patch with above change.
          Hide
          kasha Karthik Kambatla added a comment -

          If allocating a container is going to take the amShare over the maxAMShare, not allocating and hence unreserving resources seems reasonable. That said, we should also add the same check before making such a reservation in FSAppAttempt#assignContainer.

          There is already a check to ensure we won't go over maxShare. In terms of code organization, I would like for us to create a helper method (okayToReserveResources) that would check the maxShare for all containers and maxAMShare for AM containers.

          Also, looking at the code, I see fitsInMaxShare method is a static in FairScheduler. We should just make it a non-static method in FSQueue, it can call parent.fitsInMaxShare. Can we file a follow-up JIRA for it?

          Show
          kasha Karthik Kambatla added a comment - If allocating a container is going to take the amShare over the maxAMShare, not allocating and hence unreserving resources seems reasonable. That said, we should also add the same check before making such a reservation in FSAppAttempt#assignContainer. There is already a check to ensure we won't go over maxShare. In terms of code organization, I would like for us to create a helper method (okayToReserveResources) that would check the maxShare for all containers and maxAMShare for AM containers. Also, looking at the code, I see fitsInMaxShare method is a static in FairScheduler. We should just make it a non-static method in FSQueue, it can call parent.fitsInMaxShare. Can we file a follow-up JIRA for it?
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the patch zhihai xu,

          I was just wondering though.. with your approach, assume the following situation (please correct me if I am wrong)

          • We have 3 nodes with say 4GB capacity.
          • Currently, applications are using up 3GB on each node (assume they are all fairly long running tasks..).
          • At time T1, A new app (appX) is added, and requires 2 GB.
          • At some time T2, the next allocation event (after all nodes have sent heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 2GB is made on each node for appX.
          • At some time T3, during the next allocation event, As per your patch, the reservation for appX will be removed from ALL nodes..
          • Thus reservations for appX will flip-flop on all nodes. It is possible that during the period when there is no reservation for appX. other apps with < 1GB requirement might come in and be scheduled on the cluster... thereby starving appX
          Show
          asuresh Arun Suresh added a comment - Thanks for the patch zhihai xu , I was just wondering though.. with your approach, assume the following situation (please correct me if I am wrong) We have 3 nodes with say 4GB capacity. Currently, applications are using up 3GB on each node (assume they are all fairly long running tasks..). At time T1, A new app (appX) is added, and requires 2 GB. At some time T2, the next allocation event (after all nodes have sent heartbeat.. or after a continuousScheduling attempt) happens, a reservation of 2GB is made on each node for appX. At some time T3, during the next allocation event, As per your patch, the reservation for appX will be removed from ALL nodes.. Thus reservations for appX will flip-flop on all nodes. It is possible that during the period when there is no reservation for appX. other apps with < 1GB requirement might come in and be scheduled on the cluster... thereby starving appX
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 32s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 36s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 48s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          -1 findbugs 1m 18s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 60m 18s Tests failed in hadoop-yarn-server-resourcemanager.
              96m 33s  



          Reason Tests
          FindBugs module:hadoop-yarn-server-resourcemanager
            Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156]
          Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12733478/YARN-3655.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / a46506d
          Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7964/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7964/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 32s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 36s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 48s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. -1 findbugs 1m 18s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. -1 yarn tests 60m 18s Tests failed in hadoop-yarn-server-resourcemanager.     96m 33s   Reason Tests FindBugs module:hadoop-yarn-server-resourcemanager   Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12733478/YARN-3655.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a46506d Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7964/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7964/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          I uploaded a new patch YARN-3655.001.patch, which added a test case to verify this fix. Without the fix, the test will fail.

          Show
          zxu zhihai xu added a comment - I uploaded a new patch YARN-3655 .001.patch, which added a test case to verify this fix. Without the fix, the test will fail.
          Hide
          zxu zhihai xu added a comment -

          findbugs warning is not related to the attached patch, I created YARN-3667 to fix the findbugs warning.
          Also the test failure TestNodeLabelContainerAllocation is not related to the attached patch. It looks like a fake test failure, because the test report(
          https://builds.apache.org/job/PreCommit-YARN-Build/7956/testReport/ ) doesn't have this failure.

          Show
          zxu zhihai xu added a comment - findbugs warning is not related to the attached patch, I created YARN-3667 to fix the findbugs warning. Also the test failure TestNodeLabelContainerAllocation is not related to the attached patch. It looks like a fake test failure, because the test report( https://builds.apache.org/job/PreCommit-YARN-Build/7956/testReport/ ) doesn't have this failure.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 39s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 35s There were no new javac warning messages.
          +1 javadoc 9m 31s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 53s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          -1 findbugs 1m 19s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.
          -1 yarn tests 60m 16s Tests failed in hadoop-yarn-server-resourcemanager.
              96m 47s  



          Reason Tests
          FindBugs module:hadoop-yarn-server-resourcemanager
            Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156]
          Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12733282/YARN-3655.000.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 8f37873
          Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7956/testReport/
          Java 1.7.0_55
          uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7956/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 39s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 35s There were no new javac warning messages. +1 javadoc 9m 31s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 53s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. -1 findbugs 1m 19s The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. -1 yarn tests 60m 16s Tests failed in hadoop-yarn-server-resourcemanager.     96m 47s   Reason Tests FindBugs module:hadoop-yarn-server-resourcemanager   Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Timed out tests org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12733282/YARN-3655.000.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 8f37873 Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7956/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7956/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          I uploaded a patch YARN-3655.000.patch for review.

          Show
          zxu zhihai xu added a comment - I uploaded a patch YARN-3655 .000.patch for review.

            People

            • Assignee:
              zxu zhihai xu
              Reporter:
              zxu zhihai xu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development