Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible.

      1. YARN-3136.branch-2.7.patch
        8 kB
        Wangda Tan
      2. 00013-YARN-3136.patch
        7 kB
        Jian He
      3. 00012-YARN-3136.patch
        7 kB
        Sunil G
      4. 00011-YARN-3136.patch
        7 kB
        Sunil G
      5. 00010-YARN-3136.patch
        7 kB
        Sunil G
      6. 0009-YARN-3136.patch
        7 kB
        Sunil G
      7. 0008-YARN-3136.patch
        9 kB
        Sunil G
      8. 0007-YARN-3136.patch
        8 kB
        Sunil G
      9. 0006-YARN-3136.patch
        7 kB
        Sunil G
      10. 0005-YARN-3136.patch
        12 kB
        Sunil G
      11. 0004-YARN-3136.patch
        7 kB
        Sunil G
      12. 0003-YARN-3136.patch
        7 kB
        Sunil G
      13. 0002-YARN-3136.patch
        6 kB
        Sunil G
      14. 0001-YARN-3136.patch
        5 kB
        Sunil G

        Activity

        Hide
        leftnoteasy Wangda Tan added a comment -

        Backport and commit to branch-2.7. Ran all tests of resourcemanager before pushing. Attaching patch which is committed to branch-2.7.

        Show
        leftnoteasy Wangda Tan added a comment - Backport and commit to branch-2.7. Ran all tests of resourcemanager before pushing. Attaching patch which is committed to branch-2.7.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2118 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2118/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2118 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2118/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/169/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/169/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #159 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/159/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #159 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/159/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2100 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2100/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2100 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2100/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #902 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/902/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #902 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/902/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #168 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/168/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #168 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/168/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        Hide
        sunilg Sunil G added a comment -

        Thank you very much Jian He for committing the patch and thank you Jason Lowe for the reviews.

        Show
        sunilg Sunil G added a comment - Thank you very much Jian He for committing the patch and thank you Jason Lowe for the reviews.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #7611 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7611/)
        YARN-3136. Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
        • hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7611 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7611/ ) YARN-3136 . Fixed a synchronization problem of AbstractYarnScheduler#getTransferredContainers. Contributed by Sunil G (jianhe: rev 497c86b485b1bb8a2eba52308646d8e1ee76bce3) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java hadoop-yarn-project/CHANGES.txt
        Hide
        jianhe Jian He added a comment -

        Committed to trunk and branch-2, thanks Sunil G !
        Thanks Jason for reviewing the patch !

        Show
        jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Sunil G ! Thanks Jason for reviewing the patch !
        Hide
        jianhe Jian He added a comment -

        Sunil G, below code in latest patch can suppress the findbugs warnings.

        +    <Class name="org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler" />
        +    <Or>
        +      <Field name="rmContext" />
        +      <Field name="applications" />
        +    </Or>
        +    <Bug pattern="IS2_INCONSISTENT_SYNC" />
        

        btw. you can also run mvn findbugs:findbugs to test the findbugs locally, the output should be in a file called findbugs.xml

        committing latest patch.

        Show
        jianhe Jian He added a comment - Sunil G , below code in latest patch can suppress the findbugs warnings. + < Class name= "org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler" /> + <Or> + <Field name= "rmContext" /> + <Field name= "applications" /> + </Or> + <Bug pattern= "IS2_INCONSISTENT_SYNC" /> btw. you can also run mvn findbugs:findbugs to test the findbugs locally, the output should be in a file called findbugs.xml committing latest patch.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12726264/00013-YARN-3136.patch
        against trunk revision f47a576.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7391//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7391//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726264/00013-YARN-3136.patch against trunk revision f47a576. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7391//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7391//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12726264/00013-YARN-3136.patch
        against trunk revision f47a576.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
        org.apache.hadoop.yarn.server.resourcemanager.TestRM
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
        org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower
        org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
        org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
        org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
        org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestEmptyQueues
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestAppPage
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
        org.apache.hadoop.yarn.server.resourcemanager.TestAppManager
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations
        org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

        The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7389//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7389//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726264/00013-YARN-3136.patch against trunk revision f47a576. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestEmptyQueues org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestAppPage org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.TestAppManager org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7389//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7389//console This message is automatically generated.
        Hide
        jianhe Jian He added a comment -

        Hi Sunil G, we can suppress at the filed level. Just uploaded a patch myself.

        Show
        jianhe Jian He added a comment - Hi Sunil G , we can suppress at the filed level. Just uploaded a patch myself.
        Hide
        sunilg Sunil G added a comment -

        Hi Jian He

        I used below suppression

          <Match>
            <Class name="org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler" />
            <Method name="getTransferredContainers" />
            <Bug pattern="IS2_INCONSISTENT_SYNC" />
          </Match>
        

        But still i get same problem. Cud i try to suppress with field level? Pls suggest.

        Show
        sunilg Sunil G added a comment - Hi Jian He I used below suppression <Match> <Class name="org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler" /> <Method name="getTransferredContainers" /> <Bug pattern="IS2_INCONSISTENT_SYNC" /> </Match> But still i get same problem. Cud i try to suppress with field level? Pls suggest.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12726204/00012-YARN-3136.patch
        against trunk revision c6b5203.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7379//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7379//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7379//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726204/00012-YARN-3136.patch against trunk revision c6b5203. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7379//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7379//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7379//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Rebased against trunk. Also changed the findbugs suppression for getTransferredContainers method.

        Show
        sunilg Sunil G added a comment - Rebased against trunk. Also changed the findbugs suppression for getTransferredContainers method.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12726183/00011-YARN-3136.patch
        against trunk revision 76e7264.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7377//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7377//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7377//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726183/00011-YARN-3136.patch against trunk revision 76e7264. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7377//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7377//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7377//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Checking jenkins again

        Show
        sunilg Sunil G added a comment - Checking jenkins again
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12725922/00011-YARN-3136.patch
        against trunk revision 1fa8075.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
        org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
        org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
        org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
        org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7360//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7360//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7360//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725922/00011-YARN-3136.patch against trunk revision 1fa8075. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7360//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7360//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7360//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Thank you Jian He
        I have added the check in exclude file. Will kick in jenkins with this patch.

        Show
        sunilg Sunil G added a comment - Thank you Jian He I have added the check in exclude file. Will kick in jenkins with this patch.
        Hide
        jianhe Jian He added a comment -

        Hi Sunil G, for that I think we can add these two in the hadoop-yarn/dev-support/findbugs-exclude.xml to suppress them

        Show
        jianhe Jian He added a comment - Hi Sunil G , for that I think we can add these two in the hadoop-yarn/dev-support/findbugs-exclude.xml to suppress them
        Hide
        sunilg Sunil G added a comment -

        HI Jian He
        Specific tests are not needed for this.
        But we need to suppress few find bugs warnings for same. Kindly share your opinion

        1. 	Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications; locked 90% of time
        Synchronized 90% of the time
        Unsynchronized access at AbstractYarnScheduler.java:[line 138]
        Unsynchronized access at AbstractYarnScheduler.java:[line 162]
        Unsynchronized access at AbstractYarnScheduler.java:[line 230]
        2. Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 95% of time
        Synchronized 95% of the time
        Unsynchronized access at AbstractYarnScheduler.java:[line 140]
        Unsynchronized access at AbstractYarnScheduler.java:[line 149]
        Synchronized access at AbstractYarnScheduler.java:[line 314]
        
        Show
        sunilg Sunil G added a comment - HI Jian He Specific tests are not needed for this. But we need to suppress few find bugs warnings for same. Kindly share your opinion 1. Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications; locked 90% of time Synchronized 90% of the time Unsynchronized access at AbstractYarnScheduler.java:[line 138] Unsynchronized access at AbstractYarnScheduler.java:[line 162] Unsynchronized access at AbstractYarnScheduler.java:[line 230] 2. Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 95% of time Synchronized 95% of the time Unsynchronized access at AbstractYarnScheduler.java:[line 140] Unsynchronized access at AbstractYarnScheduler.java:[line 149] Synchronized access at AbstractYarnScheduler.java:[line 314]
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12724243/00010-YARN-3136.patch
        against trunk revision 6495940.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7275//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7275//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7275//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724243/00010-YARN-3136.patch against trunk revision 6495940. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7275//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7275//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7275//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Yes Jian He
        I added that to fix findbugs which is not needed. I updated patch as per initial understanding. Kindly check.

        Show
        sunilg Sunil G added a comment - Yes Jian He I added that to fix findbugs which is not needed. I updated patch as per initial understanding. Kindly check.
        Hide
        jianhe Jian He added a comment -

        Sunil G, sorry for the late response.
        we can suppress the find bug warning, given it's a no issue.
        I found below synchronization is added in the newest patch, I think it's not necessary ?

            synchronized (this) {
              appImpl = this.rmContext.getRMApps().get(appId);
              amContainerId = rmContext.getRMApps().get(appId)
                  .getCurrentAppAttempt().getMasterContainer().getId();
            }
        
        Show
        jianhe Jian He added a comment - Sunil G , sorry for the late response. we can suppress the find bug warning, given it's a no issue. I found below synchronization is added in the newest patch, I think it's not necessary ? synchronized ( this ) { appImpl = this .rmContext.getRMApps().get(appId); amContainerId = rmContext.getRMApps().get(appId) .getCurrentAppAttempt().getMasterContainer().getId(); }
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe and Jian He
        Cud u pls have a look on the comment above.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe and Jian He Cud u pls have a look on the comment above.
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe Jian He

        Bug type IS2_INCONSISTENT_SYNC (click for details) 
        In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
        Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications
        Synchronized 90% of the time
        Unsynchronized access at AbstractYarnScheduler.java:[line 138]
        Unsynchronized access at AbstractYarnScheduler.java:[line 165]
        Unsynchronized access at AbstractYarnScheduler.java:[line 233]
        

        As "applications" is now a concurrent version, I feel we do not need a lock. Kindly share your opinion.

        test case failure is not related.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe Jian He Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications Synchronized 90% of the time Unsynchronized access at AbstractYarnScheduler.java:[line 138] Unsynchronized access at AbstractYarnScheduler.java:[line 165] Unsynchronized access at AbstractYarnScheduler.java:[line 233] As "applications" is now a concurrent version, I feel we do not need a lock. Kindly share your opinion. test case failure is not related.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12706936/0009-YARN-3136.patch
        against trunk revision 6413d34.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7091//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7091//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7091//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706936/0009-YARN-3136.patch against trunk revision 6413d34. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7091//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7091//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7091//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe and Jian He

        I used ConcurrentMap for 'applications'. But findbugs warnings are coming for non-synchronized access on this map. Hope that is acceptable, pls share your opinion.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe and Jian He I used ConcurrentMap for 'applications'. But findbugs warnings are coming for non-synchronized access on this map. Hope that is acceptable, pls share your opinion.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12706581/0008-YARN-3136.patch
        against trunk revision 36af4a9.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7076//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7076//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706581/0008-YARN-3136.patch against trunk revision 36af4a9. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 14 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7076//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7076//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12706510/0007-YARN-3136.patch
        against trunk revision 0b9f12c.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7071//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706510/0007-YARN-3136.patch against trunk revision 0b9f12c. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7071//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Uploading patch to check findbugs warnings.

        Show
        sunilg Sunil G added a comment - Uploading patch to check findbugs warnings.
        Hide
        sunilg Sunil G added a comment -

        Thank you Jason Lowe for pointing out.
        I will fix and upload a new patch.

        Show
        sunilg Sunil G added a comment - Thank you Jason Lowe for pointing out. I will fix and upload a new patch.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for updating the patch, Sunil. Two of the findbug warnings appear to be linked to this patch, specifically these:

        <BugInstance type='IS2_INCONSISTENT_SYNC' priority='Normal' category='MT_CORRECTNESS' message='Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications; locked 90% of time' lineNumber='138'/>
        <BugInstance type='IS2_INCONSISTENT_SYNC' priority='Normal' category='MT_CORRECTNESS' message='Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 95% of time' lineNumber='140'/>
        
        Show
        jlowe Jason Lowe added a comment - Thanks for updating the patch, Sunil. Two of the findbug warnings appear to be linked to this patch, specifically these: <BugInstance type='IS2_INCONSISTENT_SYNC' priority='Normal' category='MT_CORRECTNESS' message='Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.applications; locked 90% of time' lineNumber='138'/> <BugInstance type='IS2_INCONSISTENT_SYNC' priority='Normal' category='MT_CORRECTNESS' message='Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 95% of time' lineNumber='140'/>
        Hide
        sunilg Sunil G added a comment -

        Thank u Jian.
        All failed cases are passing locally and these are unrelated.

        Show
        sunilg Sunil G added a comment - Thank u Jian. All failed cases are passing locally and these are unrelated.
        Hide
        jianhe Jian He added a comment -

        looks good to me. test failures not related?

        Show
        jianhe Jian He added a comment - looks good to me. test failures not related?
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12703438/0006-YARN-3136.patch
        against trunk revision 5578e22.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRM
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6895//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6895//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6895//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703438/0006-YARN-3136.patch against trunk revision 5578e22. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6895//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6895//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6895//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        createReleaseCache schedules a timer task that

        Sorry. I also missed that.

        Agreeing to make 'applications' as a concurrent map. As its private and unstable, its fine to make its concurrent. But any schedulers which uses this will have to make change. Do we need to document that?
        Also attaching a patch for same.

        Show
        sunilg Sunil G added a comment - createReleaseCache schedules a timer task that Sorry. I also missed that. Agreeing to make 'applications' as a concurrent map. As its private and unstable, its fine to make its concurrent. But any schedulers which uses this will have to make change. Do we need to document that? Also attaching a patch for same.
        Hide
        jianhe Jian He added a comment -

        But createReleaseCache schedules a timer task that

        Ah, yes, you are right. I missed the part that it's on a timer. thanks for pointing out !

        the simplest thing to do here is to just document (or require, by changing the type from Map to ConcurrentMap as I originally suggested) that the underlying map must support concurrent access.

        I agree.

        Show
        jianhe Jian He added a comment - But createReleaseCache schedules a timer task that Ah, yes, you are right. I missed the part that it's on a timer. thanks for pointing out ! the simplest thing to do here is to just document (or require, by changing the type from Map to ConcurrentMap as I originally suggested) that the underlying map must support concurrent access. I agree.
        Hide
        jlowe Jason Lowe added a comment -

        createReleaseCache is only called In serviceInit, so I think should be fine.

        But createReleaseCache schedules a timer task that, sometime much later, tries to walk the applications map without a lock. It may setup the timer during serviceInit, but is it guaranteed that there's no contention when this timer task finally runs? Maybe I'm missing something.

        I have a general question that, is AbstractYarnScheduler supposed to be public for external use ?

        I wondered the same. By far the simplest thing to do here is to just document (or require, by changing the type from Map to ConcurrentMap as I originally suggested) that the underlying map must support concurrent access. If we only expect AbstractYarnScheduler to be used by the Fifo, Fair, and Capacity schedulers then we don't need to bother with the overhead of an accessor method that can be overridden, etc. Technically AbstractYarnScheduler was not marked Public, so we should be able to update it without worrying about third-party use. Agree that we should mark it Private/Unstable going forward regardless of how we eventually fix this.

        Show
        jlowe Jason Lowe added a comment - createReleaseCache is only called In serviceInit, so I think should be fine. But createReleaseCache schedules a timer task that, sometime much later, tries to walk the applications map without a lock. It may setup the timer during serviceInit, but is it guaranteed that there's no contention when this timer task finally runs? Maybe I'm missing something. I have a general question that, is AbstractYarnScheduler supposed to be public for external use ? I wondered the same. By far the simplest thing to do here is to just document (or require, by changing the type from Map to ConcurrentMap as I originally suggested) that the underlying map must support concurrent access. If we only expect AbstractYarnScheduler to be used by the Fifo, Fair, and Capacity schedulers then we don't need to bother with the overhead of an accessor method that can be overridden, etc. Technically AbstractYarnScheduler was not marked Public, so we should be able to update it without worrying about third-party use. Agree that we should mark it Private/Unstable going forward regardless of how we eventually fix this.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12703027/0005-YARN-3136.patch
        against trunk revision 95bfd08.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        See https://builds.apache.org/job/PreCommit-YARN-Build/6882//artifact/patchprocess/diffJavadocWarnings.txt for details.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6882//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6882//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6882//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703027/0005-YARN-3136.patch against trunk revision 95bfd08. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6882//artifact/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6882//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6882//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6882//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12703027/0005-YARN-3136.patch
        against trunk revision 95bfd08.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        See https://builds.apache.org/job/PreCommit-YARN-Build/6880//artifact/patchprocess/diffJavadocWarnings.txt for details.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6880//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6880//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6880//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703027/0005-YARN-3136.patch against trunk revision 95bfd08. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6880//artifact/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6880//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6880//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6880//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        Thank you Jason Lowe and Jian He

        Comments are added and also few other occurrences applications.get is replaced with the new api. Fair/Fifo schedulers also now overloaded with the new method.
        However as Jian mentioned, we need not have to protect createReleaseCache as its only used locally in serviceInit. I also feel the way as Jian, to make AbstractYarnScheduler as Private and Unstable. There are no much chances of compatibility issues

        Uploaded a patch addressing these points.

        Show
        sunilg Sunil G added a comment - Thank you Jason Lowe and Jian He Comments are added and also few other occurrences applications.get is replaced with the new api. Fair/Fifo schedulers also now overloaded with the new method. However as Jian mentioned, we need not have to protect createReleaseCache as its only used locally in serviceInit. I also feel the way as Jian, to make AbstractYarnScheduler as Private and Unstable. There are no much chances of compatibility issues Uploaded a patch addressing these points.
        Hide
        jianhe Jian He added a comment -

        Speaking of accessing the applications map without holding a proper lock, doesn't AbstractYarnScheduler.createReleaseCache() do exactly that?

        createReleaseCache is only called In serviceInit, so I think should be fine.

        I have a general question that, is AbstractYarnScheduler supposed to be public for external use ? I think AbstractYarnScheduler is just a common base class so as to avoid code duplication among CS/Fair/Fifo schedulers. we'll most likely add more changes in this class. IMHO, to not complicate things, we can just mark AbstractYarnScheduler as Private/Unstable. this method was added a year earlier and now will be invoked only if work-preserving-am-restart is enabled. Maybe I'm wrong, I doubt we'll see compatibility issues in reality.

        Show
        jianhe Jian He added a comment - Speaking of accessing the applications map without holding a proper lock, doesn't AbstractYarnScheduler.createReleaseCache() do exactly that? createReleaseCache is only called In serviceInit, so I think should be fine. I have a general question that, is AbstractYarnScheduler supposed to be public for external use ? I think AbstractYarnScheduler is just a common base class so as to avoid code duplication among CS/Fair/Fifo schedulers. we'll most likely add more changes in this class. IMHO, to not complicate things, we can just mark AbstractYarnScheduler as Private/Unstable. this method was added a year earlier and now will be invoked only if work-preserving-am-restart is enabled. Maybe I'm wrong, I doubt we'll see compatibility issues in reality.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for updating the patch, Sunil. Comments:

        I'd like to see a comment on the method stating why it's synchronizing and that it should be overridden for performance if the underlying map supports concurrent access. Also we're inconsistent about using the new method. There are plenty of places, both in AbstractYarnScheduler and otherwise, that still call applications.get rather than the new method to abstract it.

        The FairScheduler and FifoScheduler also use concurrent maps, shouldn't they override the method as well?

        Speaking of accessing the applications map without holding a proper lock, doesn't AbstractYarnScheduler.createReleaseCache() do exactly that? Seems like the underlying map needs to be concurrent for that code to iterate the map without holding a lock, and either it's safe to assume it is concurrent (and thus we don't need all this extra get method overhead and related changes), or createReleaseCache() is broken for third-party schedulers who didn't bother to use a concurrent map.

        Show
        jlowe Jason Lowe added a comment - Thanks for updating the patch, Sunil. Comments: I'd like to see a comment on the method stating why it's synchronizing and that it should be overridden for performance if the underlying map supports concurrent access. Also we're inconsistent about using the new method. There are plenty of places, both in AbstractYarnScheduler and otherwise, that still call applications.get rather than the new method to abstract it. The FairScheduler and FifoScheduler also use concurrent maps, shouldn't they override the method as well? Speaking of accessing the applications map without holding a proper lock, doesn't AbstractYarnScheduler.createReleaseCache() do exactly that? Seems like the underlying map needs to be concurrent for that code to iterate the map without holding a lock, and either it's safe to assume it is concurrent (and thus we don't need all this extra get method overhead and related changes), or createReleaseCache() is broken for third-party schedulers who didn't bother to use a concurrent map.
        Hide
        sunilg Sunil G added a comment -

        Errors seems unrelated.

        Show
        sunilg Sunil G added a comment - Errors seems unrelated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702491/0004-YARN-3136.patch
        against trunk revision 3560180.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6839//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6839//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6839//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702491/0004-YARN-3136.patch against trunk revision 3560180. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6839//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6839//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6839//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12699495/0003-YARN-3136.patch
        against trunk revision 3560180.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6838//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6838//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6838//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699495/0003-YARN-3136.patch against trunk revision 3560180. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6838//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6838//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6838//console This message is automatically generated.
        Hide
        sunilg Sunil G added a comment -

        HI Jason Lowe and Jian He
        Could u please take a look on the patch.

        Show
        sunilg Sunil G added a comment - HI Jason Lowe and Jian He Could u please take a look on the patch.
        Hide
        sunilg Sunil G added a comment -

        Yes Jason Lowe. Its good to keep the backward compatibility.

        can be overridden in derived schedulers

        A new method named getSchedulerApplication can be added in AbstractYarnScheduler and it can come with lock by default to access application object from applications map.
        Later in CS or other scheduler, we can override to remove the lock.
        I attached a patch on this. Please see whether its same as you mentioned.

        Show
        sunilg Sunil G added a comment - Yes Jason Lowe . Its good to keep the backward compatibility. can be overridden in derived schedulers A new method named getSchedulerApplication can be added in AbstractYarnScheduler and it can come with lock by default to access application object from applications map. Later in CS or other scheduler, we can override to remove the lock. I attached a patch on this. Please see whether its same as you mentioned.
        Hide
        jlowe Jason Lowe added a comment -

        Sorry for the delay in replying. It is true that we could break some schedulers by moving the definition from a Map to a ConcurrentMap. Alternatives I see to that approach are:

        1) use a read-write lock around accessing the map, which unfortunately can be wasteful for some map implementations (e.g.: concurrent maps).

        2) use methods to access the map whose default implementation use a lock by default (for backwards compatibility) but can be overridden in derived schedulers for performance when the underlying map does not require a lock

        3) use RTTI when accessing the map to see if it's concurrent and lock when it's not. Not a huge fan of this one.

        Show
        jlowe Jason Lowe added a comment - Sorry for the delay in replying. It is true that we could break some schedulers by moving the definition from a Map to a ConcurrentMap. Alternatives I see to that approach are: 1) use a read-write lock around accessing the map, which unfortunately can be wasteful for some map implementations (e.g.: concurrent maps). 2) use methods to access the map whose default implementation use a lock by default (for backwards compatibility) but can be overridden in derived schedulers for performance when the underlying map does not require a lock 3) use RTTI when accessing the map to see if it's concurrent and lock when it's not. Not a huge fan of this one.
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe and Jian He

        Could you please have a look on the proposed patch.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe and Jian He Could you please have a look on the proposed patch.
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe Jian He

        applications map is made and ConcurrentMap and can thus enforce concurrency. However as mentioned in previous comments, this can cause issues for existing custom schedulers which doesnt use ConcurrentMap.
        Pls share your comments.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe Jian He applications map is made and ConcurrentMap and can thus enforce concurrency. However as mentioned in previous comments, this can cause issues for existing custom schedulers which doesnt use ConcurrentMap. Pls share your comments.
        Hide
        sunilg Sunil G added a comment -

        Hi Jason Lowe
        Yes. I also had a concern on same, hence was planning for scheduler level lock to get object from applications map.

        FiFoScheduler is initializing with ConcurrentSkipListMap, CS and Fair Scheduler are using ConcurrentHashMap.
        So if we can have applications as ConcurrentMap, it looks fine. This does enforce other pluggable schedulers to keep applications as a concurrentmap. Also this will break any such schedulers which are present now once this change is made.

        Show
        sunilg Sunil G added a comment - Hi Jason Lowe Yes. I also had a concern on same, hence was planning for scheduler level lock to get object from applications map. FiFoScheduler is initializing with ConcurrentSkipListMap , CS and Fair Scheduler are using ConcurrentHashMap . So if we can have applications as ConcurrentMap , it looks fine. This does enforce other pluggable schedulers to keep applications as a concurrentmap. Also this will break any such schedulers which are present now once this change is made.
        Hide
        jlowe Jason Lowe added a comment -

        I have one main concern with the patch. AbstractYarnScheduler declares the applications map but does not actually instantiates the map. If we remove the lock on getTransferredContainers then we are making the assumption that derived schedulers will use a concurrent map when instantiating the applications map. Therefore I think AbstractYarnScheduler should declare applications as a ConcurrentMap rather than just a plain ol' Map to better enforce that assumption. Otherwise theoretically another derived scheduler could use a map that's not thread-safe and subtly break this.

        Show
        jlowe Jason Lowe added a comment - I have one main concern with the patch. AbstractYarnScheduler declares the applications map but does not actually instantiates the map. If we remove the lock on getTransferredContainers then we are making the assumption that derived schedulers will use a concurrent map when instantiating the applications map. Therefore I think AbstractYarnScheduler should declare applications as a ConcurrentMap rather than just a plain ol' Map to better enforce that assumption. Otherwise theoretically another derived scheduler could use a map that's not thread-safe and subtly break this.
        Hide
        sunilg Sunil G added a comment -

        Attaching a patch as discussed. Kindly check.

        Show
        sunilg Sunil G added a comment - Attaching a patch as discussed. Kindly check.
        Hide
        varun_saxena Varun Saxena added a comment -

        Sunil G, you had left it unassigned and not mentioned that you are looking to work on it. That's why took it up.

        Show
        varun_saxena Varun Saxena added a comment - Sunil G , you had left it unassigned and not mentioned that you are looking to work on it. That's why took it up.
        Hide
        sunilg Sunil G added a comment -

        Hi Rohith Sharma K S

        getLiveContainers() is giving a copy as you mentioned. Earlier I was looking for getter method to fetch FicaSchedulerApp from applications with schedule lock. But that also not needed.

        Map<ApplicationId, SchedulerApplication<T>> applications
        
        this.applications =
                 new ConcurrentHashMap<ApplicationId,
                     SchedulerApplication<FiCaSchedulerApp>>();
        

        applications is a concurrent hash map. Hence synchronized can be removed from getTransferredContainers without any change in the method itself. Also as mentioned by Jian He, a check will be added on registerApplicationMaster().

        Show
        sunilg Sunil G added a comment - Hi Rohith Sharma K S getLiveContainers() is giving a copy as you mentioned. Earlier I was looking for getter method to fetch FicaSchedulerApp from applications with schedule lock. But that also not needed. Map<ApplicationId, SchedulerApplication<T>> applications this .applications = new ConcurrentHashMap<ApplicationId, SchedulerApplication<FiCaSchedulerApp>>(); applications is a concurrent hash map. Hence synchronized can be removed from getTransferredContainers without any change in the method itself. Also as mentioned by Jian He , a check will be added on registerApplicationMaster() .
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        I agree that synchronization is not required for getTransferredContainers methods.

        So a synchronized getter can be invoked from getTransferredContainers to get the application object for list of containers

        I think this is not required since app.getCurrentAppAttempt().getLiveContainers() always returns copy of livecontainers i.e new object of ArrayList.

        Show
        rohithsharma Rohith Sharma K S added a comment - I agree that synchronization is not required for getTransferredContainers methods. So a synchronized getter can be invoked from getTransferredContainers to get the application object for list of containers I think this is not required since app.getCurrentAppAttempt().getLiveContainers() always returns copy of livecontainers i.e new object of ArrayList.
        Hide
        sunilg Sunil G added a comment -

        Hi Jian He

        getTransferredContainers does not need to be invoked if ApplicationSubmissionContext#getKeepContainersAcrossApplicationAttempts is false.

        Yes. It could be skipped.

        synchronization on getTransferredContainers is not needed

        Yes. This is to be removed. But it internally invokes applications map. So a synchronized getter can be invoked from getTransferredContainers to get the application object for list of containers. Pls share your opinion.

        Show
        sunilg Sunil G added a comment - Hi Jian He getTransferredContainers does not need to be invoked if ApplicationSubmissionContext#getKeepContainersAcrossApplicationAttempts is false. Yes. It could be skipped. synchronization on getTransferredContainers is not needed Yes. This is to be removed. But it internally invokes applications map. So a synchronized getter can be invoked from getTransferredContainers to get the application object for list of containers. Pls share your opinion.
        Hide
        jianhe Jian He added a comment -

        ApplicationMasterService#allocate

        sorry, I meant register.

        Show
        jianhe Jian He added a comment - ApplicationMasterService#allocate sorry, I meant register.
        Hide
        jianhe Jian He added a comment -

        I think the synchronization on getTransferredContainers is not needed.
        Also, in ApplicationMasterService#allocate, getTransferredContainers does not need to be invoked if ApplicationSubmissionContext#getKeepContainersAcrossApplicationAttempts is false.

        Show
        jianhe Jian He added a comment - I think the synchronization on getTransferredContainers is not needed. Also, in ApplicationMasterService#allocate, getTransferredContainers does not need to be invoked if ApplicationSubmissionContext#getKeepContainersAcrossApplicationAttempts is false.
        Hide
        sunilg Sunil G added a comment -

        1. As you mentioned a full lock on Scheduler is grabbed for getting liveContainers (except am container) from FicaSchedulerApp. This comes from below map.

        protected Map<ApplicationId, SchedulerApplication<T>> applications;
        

        And I could see that, most of the methods are grabbing a Scheduler lock while accessing this map.

        2. rmContext.getRMApps(). Yes, this is a concurrent map and i feel this need not to be surrounded by a lock in scheduler level.

        I feel, we could improve getTransferredContainers like below:
        A getter can be used under scheduler lock to get the FicaSchedulerApp object within getTransferredContainers . Also we can remove the scheduler lock from getTransferredContainers. This must be safe enough.

        Varun Saxena As i have done some ground work, is it ok if i assign this ticket to myself .
        Jason Lowe pls share your thoughts on this approach. Also I would like to take this jira, pls feel free to takeover otherwise .

        Show
        sunilg Sunil G added a comment - 1. As you mentioned a full lock on Scheduler is grabbed for getting liveContainers (except am container) from FicaSchedulerApp. This comes from below map. protected Map<ApplicationId, SchedulerApplication<T>> applications; And I could see that, most of the methods are grabbing a Scheduler lock while accessing this map. 2. rmContext.getRMApps() . Yes, this is a concurrent map and i feel this need not to be surrounded by a lock in scheduler level. I feel, we could improve getTransferredContainers like below: A getter can be used under scheduler lock to get the FicaSchedulerApp object within getTransferredContainers . Also we can remove the scheduler lock from getTransferredContainers. This must be safe enough. Varun Saxena As i have done some ground work, is it ok if i assign this ticket to myself . Jason Lowe pls share your thoughts on this approach. Also I would like to take this jira, pls feel free to takeover otherwise .
        Hide
        jlowe Jason Lowe added a comment -

        It appears getTransferredContainers is grabbing the lock because it's not sure it's safe to lookup the SchedulerApplication from the applications map, yet in practice it's always a ConcurrentHashMap. Similarly the lookup of the RMApp is also from a concurrent hash map and does not require a lock. After that we're simply walking the containers of the SchedulerApplication which at best should only be locking the app and not the entire scheduler. Or am I missing a critical point where we really need the scheduler lock?

        Show
        jlowe Jason Lowe added a comment - It appears getTransferredContainers is grabbing the lock because it's not sure it's safe to lookup the SchedulerApplication from the applications map, yet in practice it's always a ConcurrentHashMap. Similarly the lookup of the RMApp is also from a concurrent hash map and does not require a lock. After that we're simply walking the containers of the SchedulerApplication which at best should only be locking the app and not the entire scheduler. Or am I missing a critical point where we really need the scheduler lock?
        Hide
        sunilg Sunil G added a comment -

        Yes Jason,

        Current lock is very big for allocate and registerApplicationMaster calls, and its good to have much smaller locks in that area.

        getTransferredContainers is already synchronized under AbstractYarnScheduler. To fill RegisterApplicationMasterResponse, still some methods from AbstractYarnScheduler are not locked in scheduler level. If that can be done, then it will be safer atleast to take RegisterApplicationMasterResponse filling part of code.

        Show
        sunilg Sunil G added a comment - Yes Jason, Current lock is very big for allocate and registerApplicationMaster calls, and its good to have much smaller locks in that area. getTransferredContainers is already synchronized under AbstractYarnScheduler. To fill RegisterApplicationMasterResponse, still some methods from AbstractYarnScheduler are not locked in scheduler level. If that can be done, then it will be safer atleast to take RegisterApplicationMasterResponse filling part of code.
        Hide
        jlowe Jason Lowe added a comment -

        Sample stacktrace:

           java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:86)
                - waiting to lock <0x000000023ae075e0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
                at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
                - locked <0x00000004cda13f98> (a org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
                at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
                at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
                at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
        

        From a cursory glance, it looks like the applications map is always a ConcurrentHashMap in practice, and I think we might be able to find a way to either remove the lock entirely or at least lock something at a more granular level than the big scheduler lock.

        Show
        jlowe Jason Lowe added a comment - Sample stacktrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:86) - waiting to lock <0x000000023ae075e0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297) - locked <0x00000004cda13f98> (a org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) From a cursory glance, it looks like the applications map is always a ConcurrentHashMap in practice, and I think we might be able to find a way to either remove the lock entirely or at least lock something at a more granular level than the big scheduler lock.

          People

          • Assignee:
            sunilg Sunil G
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development