Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0-beta
    • Fix Version/s: None
    • Component/s: api
    • Labels:
      None

      Description

      Based on discussion in YARN-1024, we will add YARN compute units as a resource for requesting and scheduling CPU processing power.

      1. YARN-1089.patch
        252 kB
        Sandy Ryza
      2. YARN-1089-1.patch
        253 kB
        Sandy Ryza

        Activity

        Hide
        Arun C Murthy added a comment -

        +1 for this enhancement.

        Show
        Arun C Murthy added a comment - +1 for this enhancement.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        I haven't been following YARN-1024, can you please summarize the proposal here for discussion? Thanks.

        Show
        Vinod Kumar Vavilapalli added a comment - I haven't been following YARN-1024 , can you please summarize the proposal here for discussion? Thanks.
        Hide
        Hitesh Shah added a comment -

        +1 to Vinod Kumar Vavilapalli's request. Sandy Ryza Could you ensure that the proposal also clearly explains how an application developer is meant to use the compute units and/or virtual cores when defining an allocation request and how the allocation (based on these 2 params) will be enforced on a container.

        Show
        Hitesh Shah added a comment - +1 to Vinod Kumar Vavilapalli 's request. Sandy Ryza Could you ensure that the proposal also clearly explains how an application developer is meant to use the compute units and/or virtual cores when defining an allocation request and how the allocation (based on these 2 params) will be enforced on a container.
        Hide
        Sandy Ryza added a comment -

        Yeah, I'll write up a document and post it on YARN-1024. I'm hoping to keep the broader discussion there so we can use this (and perhaps additional JIRAs) for the actual implementation.

        Show
        Sandy Ryza added a comment - Yeah, I'll write up a document and post it on YARN-1024 . I'm hoping to keep the broader discussion there so we can use this (and perhaps additional JIRAs) for the actual implementation.
        Hide
        Sandy Ryza added a comment -

        This turned out to be an enormous change because it basically required changing every place that a Resource is instantiated with vcores. I haven't figured out a good way to break it up.

        Some interesting parts to look at are Resource, which has doc and most of the user-facing changes and FSSchedulerNode, FiCaSchedulerNode, LeafQueue, and AppSchedulable, which deal with truncating YCUs on assignment to nodes.

        The patch doesn't include exposing YCUs on to MR users, but still required some changes on the MR side to avoid breaking compilation.

        Show
        Sandy Ryza added a comment - This turned out to be an enormous change because it basically required changing every place that a Resource is instantiated with vcores. I haven't figured out a good way to break it up. Some interesting parts to look at are Resource, which has doc and most of the user-facing changes and FSSchedulerNode, FiCaSchedulerNode, LeafQueue, and AppSchedulable, which deal with truncating YCUs on assignment to nodes. The patch doesn't include exposing YCUs on to MR users, but still required some changes on the MR side to avoid breaking compilation.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12603755/YARN-1089.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 51 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site:

        org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

        The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site:

        org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1954//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1954//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603755/YARN-1089.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 51 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1954//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1954//console This message is automatically generated.
        Hide
        Sandy Ryza added a comment -

        Updated patch should fix TestFairScheduler and TestSchedulerUtils. The TestRMContainerAllocator failure looks like MAPREDUCE-5514.

        Show
        Sandy Ryza added a comment - Updated patch should fix TestFairScheduler and TestSchedulerUtils. The TestRMContainerAllocator failure looks like MAPREDUCE-5514 .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12603998/YARN-1089-1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 51 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site:

        org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1970//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1970//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603998/YARN-1089-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 51 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1970//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1970//console This message is automatically generated.
        Hide
        Bikas Saha added a comment -

        I am afraid this is getting confusing.

        Show
        Bikas Saha added a comment - I am afraid this is getting confusing.
        Hide
        Arun C Murthy added a comment -

        I don't think we should put this in branch-2.1 or target this for hadoop-2.2.

        This is a major new feature which can be implemented in a compatible manner - let's target this for 2.3.0.

        Show
        Arun C Murthy added a comment - I don't think we should put this in branch-2.1 or target this for hadoop-2.2. This is a major new feature which can be implemented in a compatible manner - let's target this for 2.3.0.
        Hide
        Sandy Ryza added a comment -

        I'm ok with with waiting until 2.3. In case it's not clear, the consequence of this is that until then it will be impossible to place more tasks on a node than its number of virtual cores, which is essentially its number of physical cores.

        I think we should make YARN-976, documenting the meaning of vcores, a blocker for 2.2.

        Show
        Sandy Ryza added a comment - I'm ok with with waiting until 2.3. In case it's not clear, the consequence of this is that until then it will be impossible to place more tasks on a node than its number of virtual cores, which is essentially its number of physical cores. I think we should make YARN-976 , documenting the meaning of vcores, a blocker for 2.2.
        Hide
        Bikas Saha added a comment -

        At this point, I am not seeing the benefit of creating yet another cpu related configuration. While I am not against useful configurations, its already hard to configure YARN. Like Vinod and others said, can a summary of the discussions made elsewhere be placed here.

        Show
        Bikas Saha added a comment - At this point, I am not seeing the benefit of creating yet another cpu related configuration. While I am not against useful configurations, its already hard to configure YARN. Like Vinod and others said, can a summary of the discussions made elsewhere be placed here.
        Hide
        Sandy Ryza added a comment -

        As was requested, I posted a summary of the proposal on YARN-1024.

        In case it's not clear on the summary, here's the problem we're trying to solve:
        We want jobs to be portable between clusters. CPU is not a fluid resource in the way memory is. The number of cores on a machine is just as important its total processing power when scheduling tasks.

        Imagine a cluster where every node has powerful CPUs with many cores. One type of task that will be run on the cluster saturates a full CPU, but another type of task that will be run on the cluster contains two threads, each which can saturate only half a full CPU. If we have a single dimension for CPU requests, these tasks will request an equal number of those. What happens if we then move those tasks to a cluster with CPUs whose cores are half as fast? The first task will run half as fast, and the second task will run in the same amount of time. It's in the first task's interest to only request half as many CPU resources on that cluster.

        I'm also afraid of things getting complicated, but I can't think of anything better that doesn't require having the meaning of a virtual core vary widely from cluster to cluster.

        Show
        Sandy Ryza added a comment - As was requested, I posted a summary of the proposal on YARN-1024 . In case it's not clear on the summary, here's the problem we're trying to solve: We want jobs to be portable between clusters. CPU is not a fluid resource in the way memory is. The number of cores on a machine is just as important its total processing power when scheduling tasks. Imagine a cluster where every node has powerful CPUs with many cores. One type of task that will be run on the cluster saturates a full CPU, but another type of task that will be run on the cluster contains two threads, each which can saturate only half a full CPU. If we have a single dimension for CPU requests, these tasks will request an equal number of those. What happens if we then move those tasks to a cluster with CPUs whose cores are half as fast? The first task will run half as fast, and the second task will run in the same amount of time. It's in the first task's interest to only request half as many CPU resources on that cluster. I'm also afraid of things getting complicated, but I can't think of anything better that doesn't require having the meaning of a virtual core vary widely from cluster to cluster.
        Hide
        Steve Loughran added a comment -

        My stance on the viability of any arbitrary "compute unit" is known, I'm not going to go there in this JIRA.

        The patch must not break all code that creates a new Resource. Not only to avoid breaking all the code out there, but because we need a consistent strategy when resource restrictions that I do think would be viable go in: network IO, GPU. It looks like you have left the old constructor -but throughout the hadoop codebase moved to an extended one. Not doing that move would keep the patch much smaller and reassure me that things are less likely to break.

        Show
        Steve Loughran added a comment - My stance on the viability of any arbitrary "compute unit" is known, I'm not going to go there in this JIRA. The patch must not break all code that creates a new Resource. Not only to avoid breaking all the code out there, but because we need a consistent strategy when resource restrictions that I do think would be viable go in: network IO, GPU. It looks like you have left the old constructor -but throughout the hadoop codebase moved to an extended one. Not doing that move would keep the patch much smaller and reassure me that things are less likely to break.
        Hide
        Arun C Murthy added a comment -

        Cleaning up old PA jiras.

        Show
        Arun C Murthy added a comment - Cleaning up old PA jiras.

          People

          • Assignee:
            Sandy Ryza
            Reporter:
            Sandy Ryza
          • Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

            • Created:
              Updated:

              Development