Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2868

FairScheduler: Metric for latency to allocate first container for an application

    Details

    • Target Version/s:

      Description

      Add a metric to measure the latency between "starting container allocation" and "first container actually allocated".

      1. YARN-2868.002.patch
        8 kB
        Ray Chiang
      2. YARN-2868.003.patch
        8 kB
        Ray Chiang
      3. YARN-2868.004.patch
        8 kB
        Ray Chiang
      4. YARN-2868.005.patch
        7 kB
        Ray Chiang
      5. YARN-2868.006.patch
        8 kB
        Ray Chiang
      6. YARN-2868.007.patch
        7 kB
        Ray Chiang
      7. YARN-2868.008.patch
        7 kB
        Ray Chiang
      8. YARN-2868.009.patch
        9 kB
        Ray Chiang
      9. YARN-2868.010.patch
        8 kB
        Ray Chiang
      10. YARN-2868.011.patch
        6 kB
        Ray Chiang
      11. YARN-2868.012.patch
        6 kB
        Ray Chiang
      12. YARN-2868-01.patch
        8 kB
        Ray Chiang

        Issue Links

          Activity

          Hide
          rchiang Ray Chiang added a comment -

          I'll answer these in reverse order:

          2) The first AM container is the "easy" one to measure. Subsequent measurements can be tricky since the "request" time will need to be recorded somewhere until the request is actually fulfilled. Tracking all the requests and corresponding fulfillments would be a lot more work and may want more sophisticated measurements. I haven't filed a JIRA for doing the later containers.

          1) Breaking this answer into several parts. I'm not going to remember all the iterations I went through but I'll answer as best as I can.

          1A) YARN-3105 covers the enhancements to StateMachine to record state transitions generically for metrics. Jian He made the original suggestion.

          1B) There were several factors for this. I think it was a combination of wanting queue-specific metrics, wanting to separate first allocation from later allocations, working with managed and unmanaged AMs, and a desire to get a more exact measurement with less overhead. I've deleted all my earliest attempts at this (i.e. those prior to the first patch on this JIRA), so I can't provide more specific information offhand.

          Let me know if that satisfactorily answers your questions.

          Show
          rchiang Ray Chiang added a comment - I'll answer these in reverse order: 2) The first AM container is the "easy" one to measure. Subsequent measurements can be tricky since the "request" time will need to be recorded somewhere until the request is actually fulfilled. Tracking all the requests and corresponding fulfillments would be a lot more work and may want more sophisticated measurements. I haven't filed a JIRA for doing the later containers. 1) Breaking this answer into several parts. I'm not going to remember all the iterations I went through but I'll answer as best as I can. 1A) YARN-3105 covers the enhancements to StateMachine to record state transitions generically for metrics. Jian He made the original suggestion. 1B) There were several factors for this. I think it was a combination of wanting queue-specific metrics, wanting to separate first allocation from later allocations, working with managed and unmanaged AMs, and a desire to get a more exact measurement with less overhead. I've deleted all my earliest attempts at this (i.e. those prior to the first patch on this JIRA), so I can't provide more specific information offhand. Let me know if that satisfactorily answers your questions.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Going through old tickets. I have two questions

          1. Why was this done in a scheduler specific way? RMAppAttempt clearly knows when it requests and when it gets the allocation.
          2. Seems like the patch only looks at the first AM container. What happens if the we have a 2nd AM container?

          I accidentally closed this ticket, so doesn't look like I can reopen it. If folks agree, I will open a new ticket.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Going through old tickets. I have two questions Why was this done in a scheduler specific way? RMAppAttempt clearly knows when it requests and when it gets the allocation. Seems like the patch only looks at the first AM container. What happens if the we have a 2nd AM container? I accidentally closed this ticket, so doesn't look like I can reopen it. If folks agree, I will open a new ticket.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #142 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/142/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/133/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2074 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2074/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2092 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2092/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/876/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #876 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/876/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #142 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/142/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7407 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7407/)
          YARN-2868. FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7407 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7407/ ) YARN-2868 . FairScheduler: Metric for latency to allocate first container for an application. (Ray Chiang via kasha) (kasha: rev 972f1f1ab94a26ec446a272ad030fe13f03ed442) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/CHANGES.txt
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks for adding this metric, Ray. Just committed this to trunk and branch-2.

          Show
          kasha Karthik Kambatla added a comment - Thanks for adding this metric, Ray. Just committed this to trunk and branch-2.
          Hide
          kasha Karthik Kambatla added a comment -

          +1, checking this in.

          Show
          kasha Karthik Kambatla added a comment - +1, checking this in.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706128/YARN-2868.012.patch
          against trunk revision e1feb4e.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7061//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7061//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706128/YARN-2868.012.patch against trunk revision e1feb4e. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7061//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7061//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Fix patching conflict introduced by YARN-3356.

          Show
          rchiang Ray Chiang added a comment - Fix patching conflict introduced by YARN-3356 .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706083/YARN-2868.011.patch
          against trunk revision fe5c23b.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7050//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706083/YARN-2868.011.patch against trunk revision fe5c23b. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7050//console This message is automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          +1, pending Jenkins.

          Show
          kasha Karthik Kambatla added a comment - +1, pending Jenkins.
          Hide
          rchiang Ray Chiang added a comment -

          Updated again

          Show
          rchiang Ray Chiang added a comment - Updated again
          Hide
          rchiang Ray Chiang added a comment -

          RE: findbugs

          None of the flagged issues are in files changed here.

          RE: failing unit tests

          These tests are passing in my tree.

          Show
          rchiang Ray Chiang added a comment - RE: findbugs None of the flagged issues are in files changed here. RE: failing unit tests These tests are passing in my tree.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702707/YARN-2868.010.patch
          against trunk revision 8d88691.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6858//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6858//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6858//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702707/YARN-2868.010.patch against trunk revision 8d88691. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6858//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6858//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6858//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -
          • Clean up some spacing issues in the last patch.
          • Clean up the logic for setting the metric. Moved more into SchedulerApplicationAttempt method.
          Show
          rchiang Ray Chiang added a comment - Clean up some spacing issues in the last patch. Clean up the logic for setting the metric. Moved more into SchedulerApplicationAttempt method.
          Hide
          rchiang Ray Chiang added a comment -

          Update with latest feedback.

          Show
          rchiang Ray Chiang added a comment - Update with latest feedback.
          Hide
          kasha Karthik Kambatla added a comment -

          Sorry for coming in late. Few comments on the patch:

          1. FairScheduler
            1. I think we should move the logic of computing the delay for the launch of first container (i.e., allocation Time - request Time) to a method outside allocate. We would want to keep allocate() as simple and easy to read as possible. May be, we could compute that in SchedulerApplicationAttempt#trySetFirstContainerAllocatedTime?
            2. Nit: I don't see the need for storing current time in start. Same goes for firstContainerAllocatedTime
            3. Comment: I feel we should avoid looking up current time after allocation, since we care about it only once for the first container. Again, adding a check there would be leaking TMI into allocate. So, I guess we shouldn't.
            4. If possible, we should calculate delay outside of the synchronized block.
          Show
          kasha Karthik Kambatla added a comment - Sorry for coming in late. Few comments on the patch: FairScheduler I think we should move the logic of computing the delay for the launch of first container (i.e., allocation Time - request Time) to a method outside allocate. We would want to keep allocate() as simple and easy to read as possible. May be, we could compute that in SchedulerApplicationAttempt#trySetFirstContainerAllocatedTime ? Nit: I don't see the need for storing current time in start . Same goes for firstContainerAllocatedTime Comment: I feel we should avoid looking up current time after allocation, since we care about it only once for the first container. Again, adding a check there would be leaking TMI into allocate . So, I guess we shouldn't. If possible, we should calculate delay outside of the synchronized block.
          Hide
          rchiang Ray Chiang added a comment -

          Wangda Tan just following up to be sure. Are you okay with the latest patch uploaded?

          Show
          rchiang Ray Chiang added a comment - Wangda Tan just following up to be sure. Are you okay with the latest patch uploaded?
          Hide
          rchiang Ray Chiang added a comment -

          RE: Failing unit tests

          The first four tests pass fine in my tree. The TestFSRMStateStore test is related to YARN-1778.

          Show
          rchiang Ray Chiang added a comment - RE: Failing unit tests The first four tests pass fine in my tree. The TestFSRMStateStore test is related to YARN-1778 .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696021/YARN-2868.008.patch
          against trunk revision 8acc5e9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6483//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6483//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696021/YARN-2868.008.patch against trunk revision 8acc5e9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6483//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6483//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Renamed variable to appAttemptFirstContainerAllocationDelay

          Show
          rchiang Ray Chiang added a comment - Renamed variable to appAttemptFirstContainerAllocationDelay
          Hide
          rchiang Ray Chiang added a comment -

          I will do the rename.

          I'm still doing some writeup for YARN-3105. I'm almost positive it's going to fit somewhere in there, but it will definitely require further discussion concerning architecture.

          Thanks for the detailed reviewing!

          Show
          rchiang Ray Chiang added a comment - I will do the rename. I'm still doing some writeup for YARN-3105 . I'm almost positive it's going to fit somewhere in there, but it will definitely require further discussion concerning architecture. Thanks for the detailed reviewing!
          Hide
          leftnoteasy Wangda Tan added a comment -

          That makes sense, but I think we still need to address this problem in the future, since the statistic itself is not correct.

          Now my only comment is, I suggest to rename "Container First Attempt Allocation Delay" to "App Attempt First Container Allocation Delay".

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - That makes sense, but I think we still need to address this problem in the future, since the statistic itself is not correct. Now my only comment is, I suggest to rename "Container First Attempt Allocation Delay" to "App Attempt First Container Allocation Delay". Thanks, Wangda
          Hide
          rchiang Ray Chiang added a comment -

          My current thoughts:

          D1) Metrics are bound to be overlapping. If we measure items that have potentially long or highly-variable runtimes (e.g. state transitions), then some of them are going to need finer grained measurements.

          D2) I'd like to get in this JIRA and follow up with some other container measurements in the same parts of the code.

          D3) YARN-3105 will be the next thing I work on after the parts in D2)

          D4) Your suggestion is a broader measurement in that it captures more of the user-perceieved delay for first container allocation. I think both are valid measurements, but serve different purposes. I'd like to have both, but that could be a new JIRA (or maybe part of YARN-3105, depending on how it's finally scoped).

          Show
          rchiang Ray Chiang added a comment - My current thoughts: D1) Metrics are bound to be overlapping. If we measure items that have potentially long or highly-variable runtimes (e.g. state transitions), then some of them are going to need finer grained measurements. D2) I'd like to get in this JIRA and follow up with some other container measurements in the same parts of the code. D3) YARN-3105 will be the next thing I work on after the parts in D2) D4) Your suggestion is a broader measurement in that it captures more of the user-perceieved delay for first container allocation. I think both are valid measurements, but serve different purposes. I'd like to have both, but that could be a new JIRA (or maybe part of YARN-3105 , depending on how it's finally scoped).
          Hide
          leftnoteasy Wangda Tan added a comment -

          Our scenario is debugging queue related issues for which we need queue related metrics because scheduling decisions are made based on the queue. What would be a good place to add metrics for all those queue related metrics?

          It makes sense to me since it's use-case driven.
          However, I'm wondering maybe the "first container allocation delay" is not correctly calculated in this patch. Thinking about a queue with some pending applications, but no queue gets resource allocated from RM (maybe there's any issue of the cluster). In this case, the "first container allocation delay" will be 0. I think we should consider the time of an app waiting for RM allocating container. So even if there's no container allocated in a queue, "first container allocation delay" will still be consistently increasing, which can help trouble shooting cluster issues.

          Does this make sense? Jian He.

          Show
          leftnoteasy Wangda Tan added a comment - Our scenario is debugging queue related issues for which we need queue related metrics because scheduling decisions are made based on the queue. What would be a good place to add metrics for all those queue related metrics? It makes sense to me since it's use-case driven. However, I'm wondering maybe the "first container allocation delay" is not correctly calculated in this patch. Thinking about a queue with some pending applications, but no queue gets resource allocated from RM (maybe there's any issue of the cluster). In this case, the "first container allocation delay" will be 0. I think we should consider the time of an app waiting for RM allocating container. So even if there's no container allocated in a queue, "first container allocation delay" will still be consistently increasing, which can help trouble shooting cluster issues. Does this make sense? Jian He .
          Hide
          adhoot Anubhav Dhoot added a comment -

          Hi Wangda Tan
          RMAppMetrics will be interesting for us to add to metrics. We will get even more granular data than just queue related metrics. But unless we also add it to a queue metrics it will again not serve the purpose of this jira.
          Our scenario is debugging queue related issues for which we need queue related metrics because scheduling decisions are made based on the queue. What would be a good place to add metrics for all those queue related metrics?

          Show
          adhoot Anubhav Dhoot added a comment - Hi Wangda Tan RMAppMetrics will be interesting for us to add to metrics. We will get even more granular data than just queue related metrics. But unless we also add it to a queue metrics it will again not serve the purpose of this jira. Our scenario is debugging queue related issues for which we need queue related metrics because scheduling decisions are made based on the queue. What would be a good place to add metrics for all those queue related metrics?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for your summary, Ray Chiang.

          What I'm worry about to put it to QueueMetrics is: all individual metrics contained in QueueMetrics are queue-specific. If we starting to add such debugging metrics to QueueMetrics, we will end up with lots of such debugging metrics contained in QueueMetrics, which is not the original purpose of QueueMetrics.

          Is it possible to add a RMAppMetrics (existing one is not a part of MetricsSystem, but can we make it to be a part of metrics system?) and move all the debugging information to it?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for your summary, Ray Chiang . What I'm worry about to put it to QueueMetrics is: all individual metrics contained in QueueMetrics are queue-specific. If we starting to add such debugging metrics to QueueMetrics, we will end up with lots of such debugging metrics contained in QueueMetrics, which is not the original purpose of QueueMetrics. Is it possible to add a RMAppMetrics (existing one is not a part of MetricsSystem, but can we make it to be a part of metrics system?) and move all the debugging information to it?
          Hide
          rchiang Ray Chiang added a comment -

          Linked YARN-3105.

          Show
          rchiang Ray Chiang added a comment - Linked YARN-3105 .
          Hide
          rchiang Ray Chiang added a comment -

          That makes sense, but I think that's opening up another discussion topic. I'm going to open a separate JIRA for this discussion.

          Thanks Jian.

          Show
          rchiang Ray Chiang added a comment - That makes sense, but I think that's opening up another discussion topic. I'm going to open a separate JIRA for this discussion. Thanks Jian.
          Hide
          jianhe Jian He added a comment -

          From what I understand, both YARN-2802 and this jira are trying to capture the time interval among these states.

          SCHEDULED, 
          ALLOCATED, 
          LAUNCHED,
          RUNNING 
          

          YARN-2802 addresses last three states, and this jira is trying to capture the time interval between first two. If possible, I think we should make both implementations consistent.
          Even, we may consider a generic solution to capture the time interval between state-transitions.

          Show
          jianhe Jian He added a comment - From what I understand, both YARN-2802 and this jira are trying to capture the time interval among these states. SCHEDULED, ALLOCATED, LAUNCHED, RUNNING YARN-2802 addresses last three states, and this jira is trying to capture the time interval between first two. If possible, I think we should make both implementations consistent. Even, we may consider a generic solution to capture the time interval between state-transitions.
          Hide
          rchiang Ray Chiang added a comment -

          I would like to make this metrics discussion a bit more clear for my own sanity. The current situation:

          A1) ClusterMetrics, prior to YARN-2802, only had NM metrics. AM metrics were added in YARN-2802, partly because storing in each node isn't useful for debugging. Review from Vinod pushed the metric from the RM (since it really isn't RM related) to ClusterMetrics.

          A2) QueueMetrics (and derived classes) currently has metrics for App counts and MB/VCore/Container statistics.

          This JIRA is the first of many, to start placing the metrics to get some sort of YARN profiling in place, at least at some basic level.

          B1) If it's put into ClusterMetrics, it is as Anubhav mentioned, a good global metric/warning system, but won't necessarily help with debugging other than at the cluster level.

          B2) If it's put into the QueueMetrics, then there is the additional ability to be able to debug queue vs. network/cluster issues with respect to container allocation.

          My feedback on the discussion so far:

          C1) I do believe container allocation has a chance of being queue dependent. Now, whether it's only useful for FairScheduler vs. other schedulers could be debated (which is why it was originally in FSQueueMetrics).

          C2) QueueMetrics has the advantage of being able to have a customer take a metrics snapshot and use it for debugging application delays (at least for this first metric so far). My goal for the near-future is to continue adding to this area in order to get a clear snapshot of any RM related application runtime metrics for each queue.

          Any thoughts?

          PS: I appreciate all the great feedback so far. It's definitely giving me places to look at the code and get a better overall understanding. Thanks.

          Show
          rchiang Ray Chiang added a comment - I would like to make this metrics discussion a bit more clear for my own sanity. The current situation: A1) ClusterMetrics, prior to YARN-2802 , only had NM metrics. AM metrics were added in YARN-2802 , partly because storing in each node isn't useful for debugging. Review from Vinod pushed the metric from the RM (since it really isn't RM related) to ClusterMetrics. A2) QueueMetrics (and derived classes) currently has metrics for App counts and MB/VCore/Container statistics. This JIRA is the first of many, to start placing the metrics to get some sort of YARN profiling in place, at least at some basic level. B1) If it's put into ClusterMetrics, it is as Anubhav mentioned, a good global metric/warning system, but won't necessarily help with debugging other than at the cluster level. B2) If it's put into the QueueMetrics, then there is the additional ability to be able to debug queue vs. network/cluster issues with respect to container allocation. My feedback on the discussion so far: C1) I do believe container allocation has a chance of being queue dependent. Now, whether it's only useful for FairScheduler vs. other schedulers could be debated (which is why it was originally in FSQueueMetrics). C2) QueueMetrics has the advantage of being able to have a customer take a metrics snapshot and use it for debugging application delays (at least for this first metric so far). My goal for the near-future is to continue adding to this area in order to get a clear snapshot of any RM related application runtime metrics for each queue. Any thoughts? PS: I appreciate all the great feedback so far. It's definitely giving me places to look at the code and get a better overall understanding. Thanks.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those

          IIUC, be inform that this would breaks compatibility. ClusterMetrics are exposed to users. It would be better to keep current metrics as it is and only work on new metrics addition.

          Show
          rohithsharma Rohith Sharma K S added a comment - All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those IIUC, be inform that this would breaks compatibility. ClusterMetrics are exposed to users. It would be better to keep current metrics as it is and only work on new metrics addition.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Adding it to ClusterMetrics will only give you a single value for the entire cluster which is pretty much useless if you want to investigate queue related issues. Adding it to a per queue metrics will give you more granular data. If you only care about the cluster wide metrics you still get that by looking at the root queue metrics. Hence we need to keep it per queue. All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those

          Show
          adhoot Anubhav Dhoot added a comment - Adding it to ClusterMetrics will only give you a single value for the entire cluster which is pretty much useless if you want to investigate queue related issues. Adding it to a per queue metrics will give you more granular data. If you only care about the cluster wide metrics you still get that by looking at the root queue metrics. Hence we need to keep it per queue. All clustermetrics that are related to a queue should be moved to per queue metrics. I will open other jiras for moving those
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I had a thought that metric can be common to all schedulers. If there is any complexities now, can be added later.
          Even I had specific doubt that this metic is for application level but not for scheduler level. I was mentioned in previous comment 2nd point. I was in dilemma that where to place exactly!! Now I see clusterMetrics has already some metrics related to AM.
          +1 for keeping in ClusterMetrics and for metric name.

          Show
          rohithsharma Rohith Sharma K S added a comment - I had a thought that metric can be common to all schedulers. If there is any complexities now, can be added later. Even I had specific doubt that this metic is for application level but not for scheduler level. I was mentioned in previous comment 2nd point. I was in dilemma that where to place exactly!! Now I see clusterMetrics has already some metrics related to AM. +1 for keeping in ClusterMetrics and for metric name.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Just checked code, is it good to put in ClusterMetrics? aMRegisterDelay/amLaunchDelay seems more related to such initial container allocation time. And name of the metric could be "App first container allocation delay".

          Sounds good? Rohith Sharma K S.

          Show
          leftnoteasy Wangda Tan added a comment - Just checked code, is it good to put in ClusterMetrics? aMRegisterDelay/amLaunchDelay seems more related to such initial container allocation time. And name of the metric could be "App first container allocation delay". Sounds good? Rohith Sharma K S .
          Hide
          rchiang Ray Chiang added a comment -

          I had it previously in FSQueueMetrics, then moved it to QueueMetrics based on Rohith's feedback, and then determined that I updating CapacityScheduler with all the matching queue stuff right now would be potentially in conflict with YARN-2986. I could push the metric back to FSQueueMetrics.

          Since this is a MutableRate, the metric shouldn't get clobbered with each app, but will get averaged in (unless I'm misunderstanding something).

          I'm going to wait on further feedback before I do more editing.

          Show
          rchiang Ray Chiang added a comment - I had it previously in FSQueueMetrics, then moved it to QueueMetrics based on Rohith's feedback, and then determined that I updating CapacityScheduler with all the matching queue stuff right now would be potentially in conflict with YARN-2986 . I could push the metric back to FSQueueMetrics. Since this is a MutableRate, the metric shouldn't get clobbered with each app, but will get averaged in (unless I'm misunderstanding something). I'm going to wait on further feedback before I do more editing.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hmm, I think it may not good enough to put in QueueMetrics (I just noticed this). Every new app will overwrite this value, which is confusing to me, also to end users. When you look at metrics fields in QueueMetrics, all of them are generic metrics of a queue, but this field seems not so generic to me.

          Is there any must-have reason or use cases to add it to QueueMetrics or alternatively you can add an "application-metrics" so you can add it there?

          Show
          leftnoteasy Wangda Tan added a comment - Hmm, I think it may not good enough to put in QueueMetrics (I just noticed this). Every new app will overwrite this value, which is confusing to me, also to end users. When you look at metrics fields in QueueMetrics, all of them are generic metrics of a queue, but this field seems not so generic to me. Is there any must-have reason or use cases to add it to QueueMetrics or alternatively you can add an "application-metrics" so you can add it there?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12694276/YARN-2868.007.patch
          against trunk revision 8f26d5a.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6400//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6400//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694276/YARN-2868.007.patch against trunk revision 8f26d5a. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6400//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6400//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -
          • Move metric to QueueMetrics parent class (to be compatible with CapacityScheduler later)
          • Remove initialized boolean variable
          • Restore AtomicLong to SchedulerApplicationAttempt
          Show
          rchiang Ray Chiang added a comment - Move metric to QueueMetrics parent class (to be compatible with CapacityScheduler later) Remove initialized boolean variable Restore AtomicLong to SchedulerApplicationAttempt
          Hide
          rchiang Ray Chiang added a comment -

          Rohith Sharma K S, it looks like CapacityScheduler/AbstractYarnScheduler is missing a couple of things needed to record container launch time. The Clock stuff is easy to add, but the queue related stuff looks like it could get complicated. I think I'd rather wait for YARN-2986 before this JIRA is implemented in CapacityScheduler. I can open a new JIRA for that once this one is done.

          Does that sound reasonable?

          Show
          rchiang Ray Chiang added a comment - Rohith Sharma K S , it looks like CapacityScheduler/AbstractYarnScheduler is missing a couple of things needed to record container launch time. The Clock stuff is easy to add, but the queue related stuff looks like it could get complicated. I think I'd rather wait for YARN-2986 before this JIRA is implemented in CapacityScheduler. I can open a new JIRA for that once this one is done. Does that sound reasonable?
          Hide
          rchiang Ray Chiang added a comment -

          Okay, my bad. I'll put it back the way it was.

          Show
          rchiang Ray Chiang added a comment - Okay, my bad. I'll put it back the way it was.
          Hide
          adhoot Anubhav Dhoot added a comment -

          volatile cannot be used for thread synchronization instead of AtomicLong. Lets revert those changes back to previous patch.

          Show
          adhoot Anubhav Dhoot added a comment - volatile cannot be used for thread synchronization instead of AtomicLong. Lets revert those changes back to previous patch.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Ray Chiang,
          Just reviewed patch, I'm not sure if you misunderstood what I and Karthik meant. I agree with what you mentioned in https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274308 (comment#1) and also Karthik's comment: https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274317&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274317. It's better to keep AtomicLong as what you originally done. Lock application from caller is not clear to me.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Ray Chiang , Just reviewed patch, I'm not sure if you misunderstood what I and Karthik meant. I agree with what you mentioned in https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274308 (comment#1) and also Karthik's comment: https://issues.apache.org/jira/browse/YARN-2868?focusedCommentId=14274317&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14274317 . It's better to keep AtomicLong as what you originally done. Lock application from caller is not clear to me. Thanks,
          Hide
          rchiang Ray Chiang added a comment -

          Addressing the doubts:

          1) It could be added to both. I'm not as familiar with CS, but I can certainly look into it.
          2) It's somewhat handy for debugging and there was a much earlier version that was recording from within different methods, so I had to store it somewhere. I was keeping it around in case the metric start/stop points needed to be changed again.

          And addressing the comments:

          1) Yeah, I could remove initialized. My next step was to measure all the other container allocations (i.e. after the first) as a separate metric and I was thinking I would use it somehow.
          2) Argh. That's probably left over from when I was changing from measuring all queues to measuring per-queue. I'll fix that.

          Thanks for the sharp eyes.

          Show
          rchiang Ray Chiang added a comment - Addressing the doubts: 1) It could be added to both. I'm not as familiar with CS, but I can certainly look into it. 2) It's somewhat handy for debugging and there was a much earlier version that was recording from within different methods, so I had to store it somewhere. I was keeping it around in case the metric start/stop points needed to be changed again. And addressing the comments: 1) Yeah, I could remove initialized. My next step was to measure all the other container allocations (i.e. after the first) as a separate metric and I was thinking I would use it somehow. 2) Argh. That's probably left over from when I was changing from measuring all queues to measuring per-queue. I'll fix that. Thanks for the sharp eyes.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks Ray Chiang for working on this jira.
          Some commets and doubts
          Doubts :

          1. From the patch I see this metric added only for FairScheduler. I believe this metric can be added for all the schedulers. Is there any specific reason for not adding for CS?
          2. I believe this metric is specific to application makes more value add. Adding into scheduler level, what is the benifit users getting from this since this value wil be always shown only for last application?

          Comments :

          1. In the below code, variable initialized is unused. If not rquired, this can be removed.
                synchronized (application) {
                  boolean initialized = application.trySetFirstAllocationRequestSentTime(start);
                }
            
          2. timediff is added for both rootMetrics and queueMetrics. So both queueMetric contrains same value.By definition of rootMetrics, it cotaines aggregated value for schedulers and queuMetrics contains only specific to queue level.
            Any thought on this?
              
             if (timediff>0) {
                        rootMetrics.addContainerFirstAttemptAllocationDelay(timediff);
                      }
                      RMApp rmApp = rmContext.getRMApps().get(appAttemptId.getApplicationId());
                      if (rmApp!=null) {
                        String metricsQueue = rmApp.getQueue();
                        if (metricsQueue!=null) {
                          FSQueue queue = queueMgr.getQueue(metricsQueue);
                          queue.getMetrics().addContainerFirstAttemptAllocationDelay(timediff);
                        }
                      }
            
          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks Ray Chiang for working on this jira. Some commets and doubts Doubts : From the patch I see this metric added only for FairScheduler. I believe this metric can be added for all the schedulers. Is there any specific reason for not adding for CS? I believe this metric is specific to application makes more value add. Adding into scheduler level, what is the benifit users getting from this since this value wil be always shown only for last application? Comments : In the below code, variable initialized is unused. If not rquired, this can be removed. synchronized (application) { boolean initialized = application.trySetFirstAllocationRequestSentTime(start); } timediff is added for both rootMetrics and queueMetrics. So both queueMetric contrains same value.By definition of rootMetrics , it cotaines aggregated value for schedulers and queuMetrics contains only specific to queue level. Any thought on this? if (timediff>0) { rootMetrics.addContainerFirstAttemptAllocationDelay(timediff); } RMApp rmApp = rmContext.getRMApps().get(appAttemptId.getApplicationId()); if (rmApp!= null ) { String metricsQueue = rmApp.getQueue(); if (metricsQueue!= null ) { FSQueue queue = queueMgr.getQueue(metricsQueue); queue.getMetrics().addContainerFirstAttemptAllocationDelay(timediff); } }
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12693520/YARN-2868.006.patch
          against trunk revision 6b17eb9.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6371//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6371//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12693520/YARN-2868.006.patch against trunk revision 6b17eb9. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6371//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6371//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Implement changes based on feedback.

          Show
          rchiang Ray Chiang added a comment - Implement changes based on feedback.
          Hide
          rchiang Ray Chiang added a comment -

          Okay. Let me look at the changes and submit a new patch. Thanks for the comments Karthik Kambatla and Wangda Tan.

          Show
          rchiang Ray Chiang added a comment - Okay. Let me look at the changes and submit a new patch. Thanks for the comments Karthik Kambatla and Wangda Tan .
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for explanation, #3 and Karthik Kambatla's comment make sense to me.

          I would prefer to rename following fields for better accuracy:
          1) Rename allocationRequestStart to firstAllocationRequestSentTime (I misunderstood in my last comment)
          2) Rename firstContainerAllocation to firstContainerAllocatedTime

          And rename setAllocationRequestStart to "trySet.."(or other better name), what it tries to do is just trying to do the set.
          Return boolean for setAllocationRequestStart make two methods consistent. Future caller may need it.

          Thoughts?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for explanation, #3 and Karthik Kambatla 's comment make sense to me. I would prefer to rename following fields for better accuracy: 1) Rename allocationRequestStart to firstAllocationRequestSentTime (I misunderstood in my last comment) 2) Rename firstContainerAllocation to firstContainerAllocatedTime And rename setAllocationRequestStart to "trySet.."(or other better name), what it tries to do is just trying to do the set. Return boolean for setAllocationRequestStart make two methods consistent. Future caller may need it. Thoughts?
          Hide
          kasha Karthik Kambatla added a comment -

          implementation of scheduler should make sure that

          I would prefer thread-safety be handled at the callee, and not be deferred to the caller.

          Show
          kasha Karthik Kambatla added a comment - implementation of scheduler should make sure that I would prefer thread-safety be handled at the callee, and not be deferred to the caller.
          Hide
          rchiang Ray Chiang added a comment -

          RE: Comment 1)

          Using AtomicLong seemed like the least error prone approach. Since we only expect this to change once, are you asking to implement this as a lock-free comparison and only do the locking when updating the value?

          RE: Comment 2)

          I'm fine with changing names for accuracy.

          RE: Comment 3)

          The goal for this metric is to measure the functionality plus overhead allocation time (for some definition of overhead). I believe your version will differ in the following ways from the current patch:

          A) Some overhead in FairScheduler#allocate() will be missed (not necessarily a bad thing or big deal)
          B) Your suggestion will not work for unmanaged applications. Were you making your suggestion just for managed applications?

          Let me know your thoughts on this.

          Show
          rchiang Ray Chiang added a comment - RE: Comment 1) Using AtomicLong seemed like the least error prone approach. Since we only expect this to change once, are you asking to implement this as a lock-free comparison and only do the locking when updating the value? RE: Comment 2) I'm fine with changing names for accuracy. RE: Comment 3) The goal for this metric is to measure the functionality plus overhead allocation time (for some definition of overhead). I believe your version will differ in the following ways from the current patch: A) Some overhead in FairScheduler#allocate() will be missed (not necessarily a bad thing or big deal) B) Your suggestion will not work for unmanaged applications. Were you making your suggestion just for managed applications? Let me know your thoughts on this.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Some comments:
          1. It seems overkill to me to use AtomicLong to ensure allocation request sent time will only be set once – implementation of scheduler should make sure that. If lock-free access is required, use volatile is enough.

          2. Rename allocationRequestStart to amContainerAllocationRequestSentTime?

          3. It's not necessary to track first AM-container allocation in SchedulerApplicationAttempt. When scheduler knows AM container allocated, it should use AM-Container.getCreationTime() and amAllocationRequestSent time to get the delay.

          Show
          leftnoteasy Wangda Tan added a comment - Some comments: 1. It seems overkill to me to use AtomicLong to ensure allocation request sent time will only be set once – implementation of scheduler should make sure that. If lock-free access is required, use volatile is enough. 2. Rename allocationRequestStart to amContainerAllocationRequestSentTime ? 3. It's not necessary to track first AM-container allocation in SchedulerApplicationAttempt. When scheduler knows AM container allocated, it should use AM-Container.getCreationTime() and amAllocationRequestSent time to get the delay.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch
          against trunk revision ae7bf31.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6313//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6313//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch against trunk revision ae7bf31. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6313//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6313//console This message is automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Ray Chiang, Thanks for working on this.
          Robert Kanter, could you wait for a moment before committing it?

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Ray Chiang , Thanks for working on this. Robert Kanter , could you wait for a moment before committing it? Thanks,
          Hide
          rchiang Ray Chiang added a comment -

          Great. Thanks!

          Show
          rchiang Ray Chiang added a comment - Great. Thanks!
          Hide
          rkanter Robert Kanter added a comment -

          Looks good to me. The last Jenkins run was a while ago and had some failures, so I've just kicked off another Jenkins run.

          Show
          rkanter Robert Kanter added a comment - Looks good to me. The last Jenkins run was a while ago and had some failures, so I've just kicked off another Jenkins run.
          Hide
          rchiang Ray Chiang added a comment -

          RE: findbugs. None affecting this code.

          RE: unit test failures. None of these 5 tests failed in my tree.

          Show
          rchiang Ray Chiang added a comment - RE: findbugs. None affecting this code. RE: unit test failures. None of these 5 tests failed in my tree.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch
          against trunk revision 389f881.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6148//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6148//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6148//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch against trunk revision 389f881. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6148//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6148//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6148//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Submit for testing

          Show
          rchiang Ray Chiang added a comment - Submit for testing
          Hide
          rchiang Ray Chiang added a comment -
          • Cleaned up unused imports
          • Didn't move @metric to a single line--that violates the 80 column width
          Show
          rchiang Ray Chiang added a comment - Cleaned up unused imports Didn't move @metric to a single line--that violates the 80 column width
          Hide
          rchiang Ray Chiang added a comment -

          Found one bug

          Show
          rchiang Ray Chiang added a comment - Found one bug
          Hide
          adhoot Anubhav Dhoot added a comment -

          Minor nits:
          FSQueueMetrics has some unused imports that were added
          The formatting for @Metric can be made in one line similar to others

          Otherwise LGTM.

          Show
          adhoot Anubhav Dhoot added a comment - Minor nits: FSQueueMetrics has some unused imports that were added The formatting for @Metric can be made in one line similar to others Otherwise LGTM.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12687545/YARN-2868.004.patch
          against trunk revision a97a1e7.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6124//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6124//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6124//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687545/YARN-2868.004.patch against trunk revision a97a1e7. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6124//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6124//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6124//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Submitting again due to failure.

          Show
          rchiang Ray Chiang added a comment - Submitting again due to failure.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12687531/YARN-2868.003.patch
          against trunk revision b7923a3.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6122//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687531/YARN-2868.003.patch against trunk revision b7923a3. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6122//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Add null checks for unit testing purposes.

          Show
          rchiang Ray Chiang added a comment - Add null checks for unit testing purposes.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12687360/YARN-2868.002.patch
          against trunk revision a095622.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6117//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6117//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6117//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687360/YARN-2868.002.patch against trunk revision a095622. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6117//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6117//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6117//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -
          • Removed protected keyword from QueueMetrics
          • Add metric to specific queue as well
          Show
          rchiang Ray Chiang added a comment - Removed protected keyword from QueueMetrics Add metric to specific queue as well
          Hide
          rchiang Ray Chiang added a comment -

          Thanks. I'll update the patch and test it out.

          Show
          rchiang Ray Chiang added a comment - Thanks. I'll update the patch and test it out.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12681627/YARN-2868-01.patch
          against trunk revision 6783d17.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5862//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5862//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681627/YARN-2868-01.patch against trunk revision 6783d17. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5862//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5862//console This message is automatically generated.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Seems like the QueueMetrics change can be reverted.
          Otherwise LGTM.

          Show
          adhoot Anubhav Dhoot added a comment - Seems like the QueueMetrics change can be reverted. Otherwise LGTM.
          Hide
          rchiang Ray Chiang added a comment -

          Oops, forgot to submit patch.

          Show
          rchiang Ray Chiang added a comment - Oops, forgot to submit patch.
          Hide
          rchiang Ray Chiang added a comment -

          First attempt at implementation. Second upload.

          Show
          rchiang Ray Chiang added a comment - First attempt at implementation. Second upload.
          Hide
          rchiang Ray Chiang added a comment -

          First attempt at implementation.

          Show
          rchiang Ray Chiang added a comment - First attempt at implementation.

            People

            • Assignee:
              rchiang Ray Chiang
              Reporter:
              rchiang Ray Chiang
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development