Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha3
    • Fix Version/s: 3.0.0-beta1
    • Component/s: None
    • Labels:
      None
    1. YARN-6706.01.patch
      14 kB
      Haibo Chen
    2. YARN-6706-YARN-1011.00.patch
      13 kB
      Haibo Chen
    3. YARN-6706-YARN-1011.01.patch
      14 kB
      Haibo Chen

      Activity

      Hide
      haibochen Haibo Chen added a comment -

      This patch does the following:
      1) Changes the ContainerScheduler.scheduleContainer() logic
      if the container is guaranteed, we first put it into the guaranteed container queue, then do a pass-through guaranteed queue followed by opportunistic queue to start as many containers as possible. Finally, if the container remains in the guaranteed queue, we know we need to kill some opportunistic container

      if the container is opportunistic, we first do a pass through pass-through guaranteed queue followed by opportunistic queue to start as many containers as possible. Then we try to enqueue the opportunistic container, which may fail if the maxOppQueue length is reached. Lastly, we may be able to start it if there happen to be some resources remaining. This way, we slightly change the semantic of maxOppQueueLength in cases where oversubscription is enabled. We enqueue an opportunistic container when neither there is unallocated resource nor can it be started due to oversubscription.

      2) When maxOppQUeueLength is set to 0 or a negative value, containers are no longer unconditionally started.

      3) Adds a new unit test in TestContainerSchedulerQueuing to test the case where the OppQueue length limit is reached with too many opportunistic container requests.

      4) Add a new method resourceAvailableToStartContainer() which we could extend to include oversubscription logic.

      Show
      haibochen Haibo Chen added a comment - This patch does the following: 1) Changes the ContainerScheduler.scheduleContainer() logic if the container is guaranteed, we first put it into the guaranteed container queue, then do a pass-through guaranteed queue followed by opportunistic queue to start as many containers as possible. Finally, if the container remains in the guaranteed queue, we know we need to kill some opportunistic container if the container is opportunistic, we first do a pass through pass-through guaranteed queue followed by opportunistic queue to start as many containers as possible. Then we try to enqueue the opportunistic container, which may fail if the maxOppQueue length is reached. Lastly, we may be able to start it if there happen to be some resources remaining. This way, we slightly change the semantic of maxOppQueueLength in cases where oversubscription is enabled. We enqueue an opportunistic container when neither there is unallocated resource nor can it be started due to oversubscription. 2) When maxOppQUeueLength is set to 0 or a negative value, containers are no longer unconditionally started. 3) Adds a new unit test in TestContainerSchedulerQueuing to test the case where the OppQueue length limit is reached with too many opportunistic container requests. 4) Add a new method resourceAvailableToStartContainer() which we could extend to include oversubscription logic.
      Hide
      haibochen Haibo Chen added a comment -

      Arun Suresh Konstantinos Karanasos please kindly review

      Show
      haibochen Haibo Chen added a comment - Arun Suresh Konstantinos Karanasos please kindly review
      Hide
      hadoopqa Hadoop QA added a comment -
      -1 overall



      Vote Subsystem Runtime Comment
      0 reexec 0m 17s Docker mode activated.
      +1 @author 0m 0s The patch does not contain any @author tags.
      +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
      +1 mvninstall 13m 32s YARN-1011 passed
      +1 compile 0m 36s YARN-1011 passed
      +1 checkstyle 0m 18s YARN-1011 passed
      +1 mvnsite 0m 28s YARN-1011 passed
      -1 findbugs 0m 43s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-1011 has 5 extant Findbugs warnings.
      +1 javadoc 0m 18s YARN-1011 passed
      +1 mvninstall 0m 25s the patch passed
      +1 compile 0m 26s the patch passed
      +1 javac 0m 26s the patch passed
      -0 checkstyle 0m 14s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1)
      +1 mvnsite 0m 25s the patch passed
      +1 whitespace 0m 0s The patch has no whitespace issues.
      +1 findbugs 0m 49s the patch passed
      +1 javadoc 0m 16s the patch passed
      -1 unit 12m 59s hadoop-yarn-server-nodemanager in the patch failed.
      +1 asflicense 0m 17s The patch does not generate ASF License warnings.
      33m 21s



      Reason Tests
      Failed junit tests hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery



      Subsystem Report/Notes
      Docker Image:yetus/hadoop:14b5c93
      JIRA Issue YARN-6706
      JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872378/YARN-6706-YARN-1011.00.patch
      Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
      uname Linux 54d856200e0f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
      Build tool maven
      Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
      git revision YARN-1011 / 153498b
      Default Java 1.8.0_131
      findbugs v3.1.0-RC1
      findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
      checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
      unit https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
      Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16175/testReport/
      modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
      Console output https://builds.apache.org/job/PreCommit-YARN-Build/16175/console
      Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

      This message was automatically generated.

      Show
      hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 32s YARN-1011 passed +1 compile 0m 36s YARN-1011 passed +1 checkstyle 0m 18s YARN-1011 passed +1 mvnsite 0m 28s YARN-1011 passed -1 findbugs 0m 43s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-1011 has 5 extant Findbugs warnings. +1 javadoc 0m 18s YARN-1011 passed +1 mvninstall 0m 25s the patch passed +1 compile 0m 26s the patch passed +1 javac 0m 26s the patch passed -0 checkstyle 0m 14s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) +1 mvnsite 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 49s the patch passed +1 javadoc 0m 16s the patch passed -1 unit 12m 59s hadoop-yarn-server-nodemanager in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 33m 21s Reason Tests Failed junit tests hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6706 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872378/YARN-6706-YARN-1011.00.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 54d856200e0f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision YARN-1011 / 153498b Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/16175/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16175/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16175/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
      Hide
      haibochen Haibo Chen added a comment -

      Patch updated to address checkstyle and unit test issues. The findbugs warnings are unrelated.

      Show
      haibochen Haibo Chen added a comment - Patch updated to address checkstyle and unit test issues. The findbugs warnings are unrelated.
      Hide
      leftnoteasy Wangda Tan added a comment -

      Haibo Chen, Arun Suresh, Konstantinos Karanasos, is there any design doc to understand what is the overall scope of the oversubscription feature? Is the design doc attached to YARN-1011 is up-to-dated?

      Thanks,

      Show
      leftnoteasy Wangda Tan added a comment - Haibo Chen , Arun Suresh , Konstantinos Karanasos , is there any design doc to understand what is the overall scope of the oversubscription feature? Is the design doc attached to YARN-1011 is up-to-dated? Thanks,
      Hide
      haibochen Haibo Chen added a comment -

      Wangda Tan I have shared the latest design doc with you.

      Show
      haibochen Haibo Chen added a comment - Wangda Tan I have shared the latest design doc with you.
      Hide
      hadoopqa Hadoop QA added a comment -
      -1 overall



      Vote Subsystem Runtime Comment
      0 reexec 0m 17s Docker mode activated.
      +1 @author 0m 0s The patch does not contain any @author tags.
      +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
      +1 mvninstall 18m 18s YARN-1011 passed
      +1 compile 0m 32s YARN-1011 passed
      +1 checkstyle 0m 20s YARN-1011 passed
      +1 mvnsite 0m 34s YARN-1011 passed
      -1 findbugs 0m 51s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-1011 has 5 extant Findbugs warnings.
      +1 javadoc 0m 19s YARN-1011 passed
      +1 mvninstall 0m 32s the patch passed
      +1 compile 0m 31s the patch passed
      +1 javac 0m 31s the patch passed
      +1 checkstyle 0m 16s the patch passed
      +1 mvnsite 0m 31s the patch passed
      +1 whitespace 0m 0s The patch has no whitespace issues.
      +1 findbugs 0m 55s the patch passed
      +1 javadoc 0m 17s the patch passed
      +1 unit 13m 28s hadoop-yarn-server-nodemanager in the patch passed.
      +1 asflicense 0m 22s The patch does not generate ASF License warnings.
      39m 17s



      Subsystem Report/Notes
      Docker Image:yetus/hadoop:14b5c93
      JIRA Issue YARN-6706
      JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873526/YARN-6706-YARN-1011.01.patch
      Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
      uname Linux 783c0dcc012b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
      Build tool maven
      Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
      git revision YARN-1011 / 153498b
      Default Java 1.8.0_131
      findbugs v3.1.0-RC1
      findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16201/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
      Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16201/testReport/
      modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
      Console output https://builds.apache.org/job/PreCommit-YARN-Build/16201/console
      Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

      This message was automatically generated.

      Show
      hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 18m 18s YARN-1011 passed +1 compile 0m 32s YARN-1011 passed +1 checkstyle 0m 20s YARN-1011 passed +1 mvnsite 0m 34s YARN-1011 passed -1 findbugs 0m 51s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-1011 has 5 extant Findbugs warnings. +1 javadoc 0m 19s YARN-1011 passed +1 mvninstall 0m 32s the patch passed +1 compile 0m 31s the patch passed +1 javac 0m 31s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 0m 31s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 55s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 13m 28s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 39m 17s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6706 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12873526/YARN-6706-YARN-1011.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 783c0dcc012b 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision YARN-1011 / 153498b Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16201/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16201/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16201/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
      Hide
      kasha Karthik Kambatla added a comment -

      I feel we should get this into trunk directly instead of YARN-1011 branch.

      The current patch looks okay, Can we add a few more things:

      1. Make ResourceUtilizationTracker pluggable. That way, we could use a different tracker when oversubscription is enabled.
      2. ContainerScheduler
        1. Why do we need maxOppQueueLength given queuingLimit?
        2. Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic?
        3. getOpportunisticContainersStatus method implementation feels awkward. How about capturing the state in the field here, and have metrics etc. pull from here?
        4. startContainersFromQueue: Local variable resourcesAvailable is unnecessary
      3. OpportunisticContainersStatus
        1. Let us clearly differentiate between allocated, used and utilized. Maybe, we should rename current Used methods to Allocated?
        2. I prefer either full name Opportunistic (in method) or Opp (shortest name that makes sense). Opport is neither short nor fully descriptive.
        3. Have we considered folding ContainerQueuingLimit class into this?

      /cc Arun Suresh and Konstantinos Karanasos

      Show
      kasha Karthik Kambatla added a comment - I feel we should get this into trunk directly instead of YARN-1011 branch. The current patch looks okay, Can we add a few more things: Make ResourceUtilizationTracker pluggable. That way, we could use a different tracker when oversubscription is enabled. ContainerScheduler Why do we need maxOppQueueLength given queuingLimit? Is there value in splitting runningContainers into runningGuaranteed and runningOpportunistic? getOpportunisticContainersStatus method implementation feels awkward. How about capturing the state in the field here, and have metrics etc. pull from here? startContainersFromQueue: Local variable resourcesAvailable is unnecessary OpportunisticContainersStatus Let us clearly differentiate between allocated, used and utilized. Maybe, we should rename current Used methods to Allocated ? I prefer either full name Opportunistic (in method) or Opp (shortest name that makes sense). Opport is neither short nor fully descriptive. Have we considered folding ContainerQueuingLimit class into this? /cc Arun Suresh and Konstantinos Karanasos
      Hide
      haibochen Haibo Chen added a comment -

      Upload a patch against trunk, which is the same as YARN-6706-YARN-1011.01.patch since it applies cleanly.

      Show
      haibochen Haibo Chen added a comment - Upload a patch against trunk, which is the same as YARN-6706 - YARN-1011 .01.patch since it applies cleanly.
      Hide
      haibochen Haibo Chen added a comment -

      Would appreciate reviews from Arun Suresh and Konstantinos Karanasos to see if the changes look OK to you folks.

      Show
      haibochen Haibo Chen added a comment - Would appreciate reviews from Arun Suresh and Konstantinos Karanasos to see if the changes look OK to you folks.
      Hide
      hadoopqa Hadoop QA added a comment -
      -1 overall



      Vote Subsystem Runtime Comment
      0 reexec 0m 27s Docker mode activated.
            Prechecks
      +1 @author 0m 0s The patch does not contain any @author tags.
      +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            trunk Compile Tests
      +1 mvninstall 15m 27s trunk passed
      +1 compile 0m 30s trunk passed
      +1 checkstyle 0m 18s trunk passed
      +1 mvnsite 0m 28s trunk passed
      -1 findbugs 0m 46s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings.
      +1 javadoc 0m 19s trunk passed
            Patch Compile Tests
      +1 mvninstall 0m 25s the patch passed
      +1 compile 0m 27s the patch passed
      +1 javac 0m 27s the patch passed
      +1 checkstyle 0m 15s the patch passed
      +1 mvnsite 0m 25s the patch passed
      +1 whitespace 0m 0s The patch has no whitespace issues.
      +1 findbugs 0m 50s the patch passed
      +1 javadoc 0m 16s the patch passed
            Other Tests
      +1 unit 15m 8s hadoop-yarn-server-nodemanager in the patch passed.
      +1 asflicense 0m 22s The patch does not generate ASF License warnings.
      37m 42s



      Subsystem Report/Notes
      Docker Image:yetus/hadoop:14b5c93
      JIRA Issue YARN-6706
      JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876466/YARN-6706.01.patch
      Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
      uname Linux dc1855100dc5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
      Build tool maven
      Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
      git revision trunk / 09653ea
      Default Java 1.8.0_131
      findbugs v3.1.0-RC1
      findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16352/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
      Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16352/testReport/
      modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
      Console output https://builds.apache.org/job/PreCommit-YARN-Build/16352/console
      Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

      This message was automatically generated.

      Show
      hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 27s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests +1 mvninstall 15m 27s trunk passed +1 compile 0m 30s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 28s trunk passed -1 findbugs 0m 46s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. +1 javadoc 0m 19s trunk passed       Patch Compile Tests +1 mvninstall 0m 25s the patch passed +1 compile 0m 27s the patch passed +1 javac 0m 27s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 50s the patch passed +1 javadoc 0m 16s the patch passed       Other Tests +1 unit 15m 8s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 37m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6706 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12876466/YARN-6706.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux dc1855100dc5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 09653ea Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16352/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16352/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16352/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
      Hide
      asuresh Arun Suresh added a comment -

      Yup. Agree this can go into trunk.

      The patch generally looks good. One comment:
      In the ContainerScheduler#scheduleContainer(), I see how startPendingContainers() will ensure that we do not have to wait for a completed container to start containers, but it maybe be better to just move all the startPendingContainers() invocations from inside the if..then..else to the last line of the method. Not sure if startPendingContainers() before and after en-queuing is usefull.

      Show
      asuresh Arun Suresh added a comment - Yup. Agree this can go into trunk. The patch generally looks good. One comment: In the ContainerScheduler#scheduleContainer() , I see how startPendingContainers() will ensure that we do not have to wait for a completed container to start containers, but it maybe be better to just move all the startPendingContainers() invocations from inside the if..then..else to the last line of the method. Not sure if startPendingContainers() before and after en-queuing is usefull.
      Hide
      haibochen Haibo Chen added a comment -

      Not sure if startPendingContainers() before and after en-queuing is usefull

      You are referring to the else statement in ContainerScheduler#scheduleContainer() right? The reason of doing startPendingContainers() both before and after en-queuing is so that we always respect the max queue limit for OPPORTUNISTIC containers.
      If we were to always do enqueueing first and then startPendingContainers(), we could end up going over the OPPR container queue length.

      For GUARANTEED containers, killOpportunisticContainers is needed if the GUARANTEED container stays in the queue after startPendingContainers(). If I misunderstood your comment, can you elaborate a little more.

      Show
      haibochen Haibo Chen added a comment - Not sure if startPendingContainers() before and after en-queuing is usefull You are referring to the else statement in ContainerScheduler#scheduleContainer() right? The reason of doing startPendingContainers() both before and after en-queuing is so that we always respect the max queue limit for OPPORTUNISTIC containers. If we were to always do enqueueing first and then startPendingContainers(), we could end up going over the OPPR container queue length. For GUARANTEED containers, killOpportunisticContainers is needed if the GUARANTEED container stays in the queue after startPendingContainers(). If I misunderstood your comment, can you elaborate a little more.
      Hide
      asuresh Arun Suresh added a comment -

      You are referring to the else statement in ContainerScheduler#scheduleContainer() right? The reason of doing startPendingContainers() both before and after en-queuing is so that we always respect the max queue limit for OPPORTUNISTIC containers.

      Yup.. makes sense..
      But was thinking about Karthik's comment and maybe we can get rid of maxOppQueueLength. I prefer just using the queuingLimit.. It also reduces configuration hassle, since queuingLimit is calculated centrally by the RM. The only reason we might need to have a locally configured maxOppQueueLength is in the case of heterogeneous clusters where max number of queued containers should differ from NM to NM - but to be honest, I am not sure it would be useful.

      Show
      asuresh Arun Suresh added a comment - You are referring to the else statement in ContainerScheduler#scheduleContainer() right? The reason of doing startPendingContainers() both before and after en-queuing is so that we always respect the max queue limit for OPPORTUNISTIC containers. Yup.. makes sense.. But was thinking about Karthik's comment and maybe we can get rid of maxOppQueueLength . I prefer just using the queuingLimit .. It also reduces configuration hassle, since queuingLimit is calculated centrally by the RM. The only reason we might need to have a locally configured maxOppQueueLength is in the case of heterogeneous clusters where max number of queued containers should differ from NM to NM - but to be honest, I am not sure it would be useful.
      Hide
      haibochen Haibo Chen added a comment -

      Thanks for your reviews! I'll update the patch to get rid of maxOppQueueLength.

      Show
      haibochen Haibo Chen added a comment - Thanks for your reviews! I'll update the patch to get rid of maxOppQueueLength.
      Hide
      haibochen Haibo Chen added a comment -

      hmm... Looks like the maxOppQueueLength is also used in MiniYarnCluster, which is used in many tests. Probably need more time to check whether there is some impact. Do you mind if I fix it in a follow up jira? Once this is committed, we can unblock YARN-6675 and others. In the meantime, I can upload another patch to fix maxOppQueueLength.

      Show
      haibochen Haibo Chen added a comment - hmm... Looks like the maxOppQueueLength is also used in MiniYarnCluster, which is used in many tests. Probably need more time to check whether there is some impact. Do you mind if I fix it in a follow up jira? Once this is committed, we can unblock YARN-6675 and others. In the meantime, I can upload another patch to fix maxOppQueueLength.
      Hide
      asuresh Arun Suresh added a comment -

      Agreed... lets discuss this in a follow up JIRA.

      Show
      asuresh Arun Suresh added a comment - Agreed... lets discuss this in a follow up JIRA.
      Hide
      haibochen Haibo Chen added a comment -

      Filed YARN-6812 to fix remove maxOppQueueLength. Any more comment?

      Show
      haibochen Haibo Chen added a comment - Filed YARN-6812 to fix remove maxOppQueueLength. Any more comment?
      Hide
      asuresh Arun Suresh added a comment -

      Thanks Haibo..
      +1 for the latest patch.. will check it in shortly

      Show
      asuresh Arun Suresh added a comment - Thanks Haibo.. +1 for the latest patch.. will check it in shortly
      Hide
      kkaranasos Konstantinos Karanasos added a comment -

      Hi guys, I am back. If it is possible, please wait one more day so that I
      can give it a look as well. Thanks!


      Konstantinos

      Show
      kkaranasos Konstantinos Karanasos added a comment - Hi guys, I am back. If it is possible, please wait one more day so that I can give it a look as well. Thanks! – Konstantinos
      Hide
      haibochen Haibo Chen added a comment -

      Thanks Konstantinos Karanasos for your coming review! FYI, I have created YARN-6831 to address Karthik's comments since it is not totally necessary to have them in this jira.

      Show
      haibochen Haibo Chen added a comment - Thanks Konstantinos Karanasos for your coming review! FYI, I have created YARN-6831 to address Karthik's comments since it is not totally necessary to have them in this jira.
      Hide
      asuresh Arun Suresh added a comment -

      Konstantinos Karanasos.. Apologies, but think you send the message just as I was committing !!
      As Haibo Chen mentioned, maybe you can post comments to YARN-6831 and we can incorporate the changes there..

      Show
      asuresh Arun Suresh added a comment - Konstantinos Karanasos .. Apologies, but think you send the message just as I was committing !! As Haibo Chen mentioned, maybe you can post comments to YARN-6831 and we can incorporate the changes there..
      Hide
      haibochen Haibo Chen added a comment -

      Thanks Arun Suresh for the reviews and commit.

      Show
      haibochen Haibo Chen added a comment - Thanks Arun Suresh for the reviews and commit.
      Hide
      hudson Hudson added a comment -

      SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12022 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12022/)
      YARN-6706. Refactor ContainerScheduler to make oversubscription change (arun suresh: rev 5b007921cdf01ecc8ed97c164b7d327b8304c529)

      • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java
      • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
      • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java
      Show
      hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12022 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12022/ ) YARN-6706 . Refactor ContainerScheduler to make oversubscription change (arun suresh: rev 5b007921cdf01ecc8ed97c164b7d327b8304c529) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java

        People

        • Assignee:
          haibochen Haibo Chen
          Reporter:
          haibochen Haibo Chen
        • Votes:
          0 Vote for this issue
          Watchers:
          8 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development