Hadoop YARN
  1. Hadoop YARN
  2. YARN-211

Allow definition of max-active-applications per queue

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: capacityscheduler
    • Labels:
      None

      Description

      In some cases, automatic max-active is not enough, especially if you need less active tasks in given queue

      1. capacity-maxactive.txt
        7 kB
        Radim Kolar
      2. max-running.txt
        30 kB
        Radim Kolar

        Activity

        Hide
        Thomas Graves added a comment -

        We shouldn't be changing the name to Running apps. Especially in like the web services. This breaks backwards compatibility.

        Can you clarify your comment "because maximum active applications is used in documentation for number of running + queued applications."? Yes there can time between when an application is made active vs when the AM gets started - waiting for a container. Is that what you are referring to? I don't think with your patch this has changed, correct?

        I believe the main reason it was a percent is because AM's could potentially use different amount of memory per AM (YARN-276 related). So I could have 2 AM's that if using the minimum allocation size would run fine but then if I have 2, one that say needs 1G and the other uses 20G, then your queue is full and everything gets deadlocked. Hence why in that case it would be better to use a % of resources.

        I can understand your scenario though. I would say if it was just one you could set the percent really low like (0.0001) because it always give you atleast one, but if you have cases where you want 2 or 3 that doesn't work so well.

        You also need to update the capacity scheduler docs about the config.

        Show
        Thomas Graves added a comment - We shouldn't be changing the name to Running apps. Especially in like the web services. This breaks backwards compatibility. Can you clarify your comment "because maximum active applications is used in documentation for number of running + queued applications."? Yes there can time between when an application is made active vs when the AM gets started - waiting for a container. Is that what you are referring to? I don't think with your patch this has changed, correct? I believe the main reason it was a percent is because AM's could potentially use different amount of memory per AM ( YARN-276 related). So I could have 2 AM's that if using the minimum allocation size would run fine but then if I have 2, one that say needs 1G and the other uses 20G, then your queue is full and everything gets deadlocked. Hence why in that case it would be better to use a % of resources. I can understand your scenario though. I would say if it was just one you could set the percent really low like (0.0001) because it always give you atleast one, but if you have cases where you want 2 or 3 that doesn't work so well. You also need to update the capacity scheduler docs about the config.
        Hide
        Radim Kolar added a comment -

        need review.

        Show
        Radim Kolar added a comment - need review.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12554083/max-running.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/174//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/174//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554083/max-running.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/174//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/174//console This message is automatically generated.
        Hide
        Radim Kolar added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12554679/max-running.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1801//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1801//console

        This message is automatically generated.

        Show
        Radim Kolar added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554679/max-running.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1801//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1801//console This message is automatically generated.
        Hide
        Radim Kolar added a comment -

        Max active applications renamed to max running applications. This makes its meaning more clear, because maximum active applications is used in documentation for number of running + queued applications.

        Show
        Radim Kolar added a comment - Max active applications renamed to max running applications. This makes its meaning more clear, because maximum active applications is used in documentation for number of running + queued applications.
        Hide
        Radim Kolar added a comment -

        what about to rename it to max-running-applications? It will be more clear about what it does.

        Show
        Radim Kolar added a comment - what about to rename it to max-running-applications? It will be more clear about what it does.
        Hide
        Radim Kolar added a comment -

        with this patch, you can still use percent to configure it. But it can be overridden by number.

        Show
        Radim Kolar added a comment - with this patch, you can still use percent to configure it. But it can be overridden by number.
        Hide
        Radim Kolar added a comment -

        I know that it is currently controlled by percent and it can be configured per queue. This is unsuitable if you need to limit number of running applications to certain low number to work around primitive cluster scheduling in hadoop.

        For example if you have nodes with 2 GPU units, then create queue with maximum-running-tasks 1 or 2 and submit applications to that queue, there is better chance that they will not run concurrently on one node.

        Using percentage for configuration is tricky, its difficult to predict target effect and it changes with number of nodes in cluster.

        Show
        Radim Kolar added a comment - I know that it is currently controlled by percent and it can be configured per queue. This is unsuitable if you need to limit number of running applications to certain low number to work around primitive cluster scheduling in hadoop. For example if you have nodes with 2 GPU units, then create queue with maximum-running-tasks 1 or 2 and submit applications to that queue, there is better chance that they will not run concurrently on one node. Using percentage for configuration is tricky, its difficult to predict target effect and it changes with number of nodes in cluster.
        Hide
        Thomas Graves added a comment -

        The number of active applications is controlled by maximum-am-resource-percent. (Maximum percent of resources in the cluster which can be used to run application masters) Which is settable per queue also. It is a percent and not a direct #.

        Show
        Thomas Graves added a comment - The number of active applications is controlled by maximum-am-resource-percent. (Maximum percent of resources in the cluster which can be used to run application masters) Which is settable per queue also. It is a percent and not a direct #.
        Hide
        Radim Kolar added a comment -

        From linked doc: Maximum number of applications in the system which can be concurrently active both running and pending.

        It seems that pending applications is considered by documentation (but not code) as active too. Rename this patch to limit number of running applications, its more clear.

        Show
        Radim Kolar added a comment - From linked doc: Maximum number of applications in the system which can be concurrently active both running and pending. It seems that pending applications is considered by documentation (but not code) as active too. Rename this patch to limit number of running applications, its more clear.
        Hide
        Radim Kolar added a comment -

        maximum-applications is number of submitted applications to queue. This patch limits number of concurrently running applications.

        Show
        Radim Kolar added a comment - maximum-applications is number of submitted applications to queue. This patch limits number of concurrently running applications.
        Hide
        Thomas Graves added a comment -
        Show
        Thomas Graves added a comment - dup of MAPREDUCE-3893
        Hide
        Thomas Graves added a comment -

        This is already supported. see http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
        you can set both maximum-applications and maximum-am-resource-percent either globally or per queue.

        Show
        Thomas Graves added a comment - This is already supported. see http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html you can set both maximum-applications and maximum-am-resource-percent either globally or per queue.

          People

          • Assignee:
            Radim Kolar
            Reporter:
            Radim Kolar
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development