Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3926

Extend the YARN resource model for easier resource-type management and profiles

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, there are efforts to add support for various resource-types such as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These efforts all aim to add support for a new resource type and are fairly involved efforts. In addition, once support is added, it becomes harder for users to specify the resources they need. All existing jobs have to be modified, or have to use the minimum allocation.

      This ticket is a proposal to extend the YARN resource model to a more flexible model which makes it easier to support additional resource-types. It also considers the related aspect of “resource profiles” which allow users to easily specify the various resources they need for any given container.

      This feature is already merged to trunk, please go to YARN-7069 for all pending items.

        Issue Links

        1.
        Add support for multiple resource types in the Resource class Sub-task Resolved Varun Vasudev
         
        2.
        Extend DominantResourceCalculator to account for all resources Sub-task Resolved Varun Vasudev
         
        3.
        Add support to read resource types from a config file Sub-task Resolved Varun Vasudev
         
        4.
        Add support for binary units Sub-task Resolved Varun Vasudev
         
        5.
        Add support for resource types in the nodemanager Sub-task Resolved Varun Vasudev
         
        6.
        Update DominantResourceCalculator to consider all resource types in calculations Sub-task Resolved Varun Vasudev
         
        7.
        Update the Resources class to consider all resource types Sub-task Resolved Varun Vasudev
         
        8.
        Add support for resource profiles Sub-task Resolved Varun Vasudev
         
        9.
        Add support for resource profiles in distributed shell Sub-task Resolved Varun Vasudev
         
        10.
        Add manager class for resource profiles Sub-task Resolved Varun Vasudev
         
        11.
        Implement APIs to get resource profiles from the RM Sub-task Resolved Varun Vasudev
         
        12.
        Update resource usage and preempted resource calculations to take into account all resource types Sub-task Resolved Varun Vasudev
         
        13.
        [YARN-3926] Performance improvements in resource profile branch with respect to SLS Sub-task Resolved Varun Vasudev
         
        14.
        DominantResourceCalculator#getResourceAsValue dominant param is updated to handle multiple resources Sub-task Resolved Daniel Templeton
         
        15.
        Fix build for YARN-3926 branch Sub-task Resolved Varun Vasudev
         
        16.
        ResourceUtils#initializeResourcesMap takes an unnecessary Map parameter Sub-task Resolved Yu-Tang Lin
         
        17.
        ResourceUtils.getResourceTypes() reloads the resource configuration every time Sub-task Resolved Unassigned
         
        18.
        DominantResourceCalculator.resourceNames should be a constant Sub-task Resolved Unassigned
         
        19.
        Resource.compareTo() is implemented twice Sub-task Resolved Daniel Templeton
         
        20.
        Resource.SimpleResource does not implement the new Resource methods Sub-task Resolved Unassigned
         
        21.
        ResourcePBImpl imports cleanup Sub-task Resolved Yeliang Cang
         
        22.
        Improve performance of resource profile branch Sub-task Resolved Sunil G
         
        23.
        Add Client API to get all supported resource types from RM Sub-task Resolved Sunil G
         
        24.
        Improve API implementation in Resources and DominantResourceCalculator class Sub-task Resolved Sunil G
         
        25.
        ResourceProfilesManagerImpl is missing @Overrides on methods Sub-task Resolved Sunil G
         
        26.
        ResourceUtils.DISALLOWED_NAMES check is duplicated Sub-task Resolved Manikandan R
         
        27.
        ResourceUtils.checkMandatoryResources() should also ensure that no min or max is set for vcores or memory Sub-task Resolved Unassigned
         
        28.
        ResourceProfilesManagerImpl.parseResource() has no need of the key parameter Sub-task Resolved Manikandan R
         
        29.
        Remove last uses of Long from resource types code Sub-task Resolved Daniel Templeton
         
        30. merge related work for YARN-3926 branch Sub-task Patch Available Daniel Templeton
         
        31.
        Performance optimizations in Resource and ResourceUtils class Sub-task Resolved Wangda Tan
         
        32.
        Fix javac and javadoc errors in YARN-3926 branch Sub-task Resolved Sunil G
         
        33.
        Clean up unit tests after YARN-6610 Sub-task Resolved Daniel Templeton
         
        34.
        Cleanup ResourceProfileManager Sub-task Resolved Wangda Tan
         
        35.
        Document Resource Profiles feature Sub-task Resolved Sunil G
         
        36.
        Optimize ResourceType information display in UI Sub-task Resolved Wangda Tan
         
        37.
        Improve log message in ResourceUtils Sub-task Resolved Sunil G
         
        38.
        Additional Performance Improvement for Resource Profile Feature Sub-task Resolved Wangda Tan
         
        39.
        Move newly added APIs to unstable in YARN-3926 branch Sub-task Resolved Wangda Tan
         
        40.
        Address performance issue when 3+ resource types configured in the system Sub-task Resolved Wangda Tan
         

          Activity

          Hide
          vvasudev Varun Vasudev added a comment -

          Attached the proposal.

          Show
          vvasudev Varun Vasudev added a comment - Attached the proposal.
          Hide
          vvasudev Varun Vasudev added a comment -

          Thoughts on the proposal are welcome. The plan is to do the work in a branch.

          Show
          vvasudev Varun Vasudev added a comment - Thoughts on the proposal are welcome. The plan is to do the work in a branch.
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks a bunch for putting this proposal together, Varun. We are in dire need of improvements to our resource-model, and the proposal goes a long way in addressing some of these issues. Huge +1 to this effort.

          Comments on the proposal itself:

          1. There is a significant overlap between resource-types.xml and node-resources.xml. It would be nice to consolidate at least these parts.
          2. Can we avoid the mismatch between the resource types on RM and NM altogether?
          3. Can we avoid different restart paths for adding and removing resources?
          4. Really like the concise configs proposed at the end of the document.

          What do you think of the following modifications to the proposal to address above wishes? I have clearly not thought as much before making these suggestions, so please feel free to shoot them down.

          1. How about calling them yarn.resource-types, yarn.resource-types.memory., yarn.resource-types.cpu.. Further, memory/cpu specific configs could be made simpler per the suggestions later in the document?
          2. yarn.scheduler.resource-types is a subset of yarn.resource-types, and captures the resource-types the scheduler supports. This could be in yarn-site on RM.
          3. yarn.nodemanager.resource-types.monitored and yarn.nodemanager.resource-types.enforced also are subsets of yarn.resource-types and could define the resources the NM monitors and enforces respectively. These could be in yarn-site on the NM. I understand isolation is out of scope here, but would be nice to have configs that lend themselves to future work.
          4. yarn.nodemanager.[resources|resource-types].available could be a map where each key should be an entry in yarn.resource-types.

          You mention capturing node-labels etc. similarly. Could you elaborate on your thoughts, at least informally? Would be super nice to have a path in mind even if we were to do as follow-up work.

          Show
          kasha Karthik Kambatla added a comment - Thanks a bunch for putting this proposal together, Varun. We are in dire need of improvements to our resource-model, and the proposal goes a long way in addressing some of these issues. Huge +1 to this effort. Comments on the proposal itself: There is a significant overlap between resource-types.xml and node-resources.xml. It would be nice to consolidate at least these parts. Can we avoid the mismatch between the resource types on RM and NM altogether? Can we avoid different restart paths for adding and removing resources? Really like the concise configs proposed at the end of the document. What do you think of the following modifications to the proposal to address above wishes? I have clearly not thought as much before making these suggestions, so please feel free to shoot them down. How about calling them yarn.resource-types, yarn.resource-types.memory. , yarn.resource-types.cpu. . Further, memory/cpu specific configs could be made simpler per the suggestions later in the document? yarn.scheduler.resource-types is a subset of yarn.resource-types, and captures the resource-types the scheduler supports. This could be in yarn-site on RM. yarn.nodemanager.resource-types.monitored and yarn.nodemanager.resource-types.enforced also are subsets of yarn.resource-types and could define the resources the NM monitors and enforces respectively. These could be in yarn-site on the NM. I understand isolation is out of scope here, but would be nice to have configs that lend themselves to future work. yarn.nodemanager. [resources|resource-types] .available could be a map where each key should be an entry in yarn.resource-types. You mention capturing node-labels etc. similarly. Could you elaborate on your thoughts, at least informally? Would be super nice to have a path in mind even if we were to do as follow-up work.
          Hide
          vvasudev Varun Vasudev added a comment -

          Thanks for the feedback Karthik Kambatla! I'm fine with changing the config variables nomenclature to the one you suggested. I just wanted to clarify that simply using the same config file won't avoid the questions you raised(specifically 2 and 3). The one way we can avoid (2) and (3) is have versions of the resources configs but I think that's a little complex. We could mitigate the issue by building some tools to verify that a proposed config file would work with the existing RM/NM. I'm open to suggestions.

          With regards to node labels, I had initial conversations with Wangda Tan but I haven't thought through the model in enough detail. My initial thinking is that we would modify the ResourceMapEntry to add a string/list of strings which can be used to specify node labels.

          Show
          vvasudev Varun Vasudev added a comment - Thanks for the feedback Karthik Kambatla ! I'm fine with changing the config variables nomenclature to the one you suggested. I just wanted to clarify that simply using the same config file won't avoid the questions you raised(specifically 2 and 3). The one way we can avoid (2) and (3) is have versions of the resources configs but I think that's a little complex. We could mitigate the issue by building some tools to verify that a proposed config file would work with the existing RM/NM. I'm open to suggestions. With regards to node labels, I had initial conversations with Wangda Tan but I haven't thought through the model in enough detail. My initial thinking is that we would modify the ResourceMapEntry to add a string/list of strings which can be used to specify node labels.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for creating the proposal, Varun! Some quick comments after a brief review:

          xinclude is a simple solution for supporting both a monolithic yarn-site.xml or a separate file if we stick with the Configuration-based approach. Code loads yarn-site.xml but users can always separate out chunks of it and xinclude them. We do this quite a bit with our configs internally.

          As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up.

          A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later. For example, couldn't it be simplified to:

            <property>
              <name>yarn.scheduler.profiles</name>
              <value>minimum,maximum,default,small,medium,large</value>
            </property>
            <property>
              <name>yarn.scheduler.profile.minimum.yarn.io/memory</name>
              <value>1024</value>
            </property>
          ....
          

          That being said I think the sample configs at the end, particularly the json form or potentially a yaml version, would be a welcome sight for those trying to setup and grok the configs.

          The sample config in the beginning has a typo, yarn.nodemanager.resource-types.cpu s/b yarn.nodemanager.resource-types.cpu.name.

          Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured.

          Show
          jlowe Jason Lowe added a comment - Thanks for creating the proposal, Varun! Some quick comments after a brief review: xinclude is a simple solution for supporting both a monolithic yarn-site.xml or a separate file if we stick with the Configuration-based approach. Code loads yarn-site.xml but users can always separate out chunks of it and xinclude them. We do this quite a bit with our configs internally. As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up. A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later. For example, couldn't it be simplified to: <property> <name>yarn.scheduler.profiles</name> <value>minimum,maximum,default,small,medium,large</value> </property> <property> <name>yarn.scheduler.profile.minimum.yarn.io/memory</name> <value>1024</value> </property> .... That being said I think the sample configs at the end, particularly the json form or potentially a yaml version, would be a welcome sight for those trying to setup and grok the configs. The sample config in the beginning has a typo, yarn.nodemanager.resource-types.cpu s/b yarn.nodemanager.resource-types.cpu.name. Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured.
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the proposal Varun Vasudev !! Interesting stuff..

          Couple of comments from my first read of the proposal :

          1. Instead of Resource.newInstance(Map<ResourceTypeInformation, Long>), can we use the builder pattern something like so :
            ResourceBuilder.dimension(ResourceTypeInformation t1).value(Long v1)
                                       .dimension(ResourceTypeInformation t2).value(Long v2)
                                        ….
                                       .create();
            
          2. The proposal states that if there is a mismatch between what the "resource-types.xml” contains and what the NM reports, it should shut down. My opinion is that node shut-down should happen only if Node reports less number of types / does not have all “enabled” types in resource-types.xml : same rationale as why nodes should not care if the resource type is “enabled” or not. If node reports more types, that dimension is just ignored. Also, in the section where you talk about adding/removing types, you mentioned that the NM should be upgraded first.. in which case it will start reporting a new type of resource.. and it should be accepted by the RM.
          3. Instead of having to explicitly mark a resource as “countable”, can’t we just assume thats the default and instead require “uncountable” types to be explicitly specified (once we start supporting it)
          4. I really like the Profiles idea… In the profile Section, do we really need a separate “yarn.scheduler.profile…name” ? can’t we just set “yarn.scheduler.profiles” to be “minimum,maximum,default,small,large” etc ?
          Show
          asuresh Arun Suresh added a comment - Thanks for the proposal Varun Vasudev !! Interesting stuff.. Couple of comments from my first read of the proposal : Instead of Resource.newInstance(Map<ResourceTypeInformation, Long>), can we use the builder pattern something like so : ResourceBuilder.dimension(ResourceTypeInformation t1).value(Long v1) .dimension(ResourceTypeInformation t2).value(Long v2) …. .create(); The proposal states that if there is a mismatch between what the "resource-types.xml” contains and what the NM reports, it should shut down. My opinion is that node shut-down should happen only if Node reports less number of types / does not have all “enabled” types in resource-types.xml : same rationale as why nodes should not care if the resource type is “enabled” or not. If node reports more types, that dimension is just ignored. Also, in the section where you talk about adding/removing types, you mentioned that the NM should be upgraded first.. in which case it will start reporting a new type of resource.. and it should be accepted by the RM. Instead of having to explicitly mark a resource as “countable”, can’t we just assume thats the default and instead require “uncountable” types to be explicitly specified (once we start supporting it) I really like the Profiles idea… In the profile Section, do we really need a separate “yarn.scheduler.profile…name” ? can’t we just set “yarn.scheduler.profiles” to be “minimum,maximum,default,small,large” etc ?
          Hide
          vvasudev Varun Vasudev added a comment -

          Thanks for comments Jason Lowe and Arun Suresh. My apologies for not responding earlier.

          As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up.

          I think most people feel that shutting down the NM is not a good idea. I'm going to go with just printing out warning messages in the RM and NM. Does that seem ok?

          A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later.

          Good point. Arun had similar feedback. I'll change this.

          Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured.

          I'll make sure to do some performance tests as part of the development.

          Instead of having to explicitly mark a resource as “countable”, can’t we just assume thats the default and instead require “uncountable” types to be explicitly specified (once we start supporting it)

          Fair point. I'll use this approach.

          Show
          vvasudev Varun Vasudev added a comment - Thanks for comments Jason Lowe and Arun Suresh . My apologies for not responding earlier. As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up. I think most people feel that shutting down the NM is not a good idea. I'm going to go with just printing out warning messages in the RM and NM. Does that seem ok? A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later. Good point. Arun had similar feedback. I'll change this. Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured. I'll make sure to do some performance tests as part of the development. Instead of having to explicitly mark a resource as “countable”, can’t we just assume thats the default and instead require “uncountable” types to be explicitly specified (once we start supporting it) Fair point. I'll use this approach.
          Hide
          vvasudev Varun Vasudev added a comment -

          I've created a YARN-3926 branch for this feature.

          Show
          vvasudev Varun Vasudev added a comment - I've created a YARN-3926 branch for this feature.
          Hide
          asuresh Arun Suresh added a comment -

          Varun Vasudev, I was wondering if the possibility of allowing an NM to broadcast newly acquired resources is something we can factor in.

          Essentially:

          • Assume, as per the design doc, each NM starts up with an initial node-resource.xml which talks about the available resource it has initially, which consists of a sub-set of resource types known to the RM. Any resource type unknown to the RM is simply ignored by the RM when making scheduling decisions.
          • At some point, we allow either the admin or via some self-discovery mechanism on the NM to add new resource types and advertise to the RM a resource update (ofcourse, these types should be know a-priori by the RM via the resource-types.xml... or we should probably add Admin API on the RM to update/add/remove the resource types on the fly)

          Thoughts ?

          Show
          asuresh Arun Suresh added a comment - Varun Vasudev , I was wondering if the possibility of allowing an NM to broadcast newly acquired resources is something we can factor in. Essentially: Assume, as per the design doc, each NM starts up with an initial node-resource.xml which talks about the available resource it has initially, which consists of a sub-set of resource types known to the RM. Any resource type unknown to the RM is simply ignored by the RM when making scheduling decisions. At some point, we allow either the admin or via some self-discovery mechanism on the NM to add new resource types and advertise to the RM a resource update (ofcourse, these types should be know a-priori by the RM via the resource-types.xml ... or we should probably add Admin API on the RM to update/add/remove the resource types on the fly) Thoughts ?
          Hide
          vvasudev Varun Vasudev added a comment -

          Arun Suresh - I was speaking with Wangda Tan offline. I suspect we'll have to support 3 modes for the RM-NM handshake which admins can configure -

          1. Strict - the RM and NM resource types must match
          2. RM subset - as long the NM resource types are a superset of the RM, the handshake proceeds - I believe this will address your concerns. Correct?
          3. Allow mismatch - the handshake will not fail due to missing resource types - missing resource types are presumed to be of value 0 by the RM.

          Does that make sense?

          Show
          vvasudev Varun Vasudev added a comment - Arun Suresh - I was speaking with Wangda Tan offline. I suspect we'll have to support 3 modes for the RM-NM handshake which admins can configure - Strict - the RM and NM resource types must match RM subset - as long the NM resource types are a superset of the RM, the handshake proceeds - I believe this will address your concerns. Correct? Allow mismatch - the handshake will not fail due to missing resource types - missing resource types are presumed to be of value 0 by the RM. Does that make sense?
          Hide
          asuresh Arun Suresh added a comment -

          RM subset - as long the NM resource types are a superset of the RM, the handshake proceeds - I believe this will address your concerns. Correct?

          That should work... But I feel, maybe allow mismatch should be the default. If NM has a super-set of RMs resource types, it will just be ignored, If sub-set, then for those specific resource-types, RM will assign a 0 value for the NM.

          Which would facilitate my other point.. Allow NMs to dynamically advertise new / disable existing resource types (NM would know of these new types via some admin API or self-discovery) as part of the NM heartbeat. Similarly, on the RM side, if the new resource advertised by the NM is unknown to RM, it just ignores it. We can also add admin API on the RM to add / remove allowable resource types on the fly.

          Show
          asuresh Arun Suresh added a comment - RM subset - as long the NM resource types are a superset of the RM, the handshake proceeds - I believe this will address your concerns. Correct? That should work... But I feel, maybe allow mismatch should be the default. If NM has a super-set of RMs resource types, it will just be ignored, If sub-set, then for those specific resource-types, RM will assign a 0 value for the NM. Which would facilitate my other point.. Allow NMs to dynamically advertise new / disable existing resource types (NM would know of these new types via some admin API or self-discovery) as part of the NM heartbeat. Similarly, on the RM side, if the new resource advertised by the NM is unknown to RM, it just ignores it. We can also add admin API on the RM to add / remove allowable resource types on the fly.
          Hide
          vvasudev Varun Vasudev added a comment - - edited

          That should work... But I feel, maybe allow mismatch should be the default. If NM has a super-set of RMs resource types, it will just be ignored, If sub-set, then for those specific resource-types, RM will assign a 0 value for the NM.

          I don't have any particular preference - I can see scenarios for all 3. I'm fine with making allow mismatch the default.

          We can also add admin API on the RM to add / remove allowable resource types on the fly.

          This should be do-able but we need to go through how this will affect on running apps.

          Show
          vvasudev Varun Vasudev added a comment - - edited That should work... But I feel, maybe allow mismatch should be the default. If NM has a super-set of RMs resource types, it will just be ignored, If sub-set, then for those specific resource-types, RM will assign a 0 value for the NM. I don't have any particular preference - I can see scenarios for all 3. I'm fine with making allow mismatch the default. We can also add admin API on the RM to add / remove allowable resource types on the fly. This should be do-able but we need to go through how this will affect on running apps.
          Hide
          asuresh Arun Suresh added a comment -

          ..how this will affect on running apps

          Agreed, might not be trivial. But my hunch is, if DRF works correctly, it should be equivalent to Cluster Capacity / Resource change (In the FairScheduler IIRC, a re-calculation of Queue and Application fair-shares are done)

          Show
          asuresh Arun Suresh added a comment - ..how this will affect on running apps Agreed, might not be trivial. But my hunch is, if DRF works correctly, it should be equivalent to Cluster Capacity / Resource change (In the FairScheduler IIRC, a re-calculation of Queue and Application fair-shares are done)
          Hide
          grey Lei Guo added a comment -

          Another topic related to rm-nm protocol is constraint label. It's not a must to be considered in this Jira, but I'd like to raise it as I can see the design in this Jira may affect the constraint label one.

          The constraint label could be some server attribute reported by NM, it could be required to be predefined in RM, but if we can allow NM to define something not defined in RM, and then RM automatically add it in label repository, it will be great. for example, for OS version or JDK version, customer may prefer automatically added instead of adding label before use.

          Show
          grey Lei Guo added a comment - Another topic related to rm-nm protocol is constraint label. It's not a must to be considered in this Jira, but I'd like to raise it as I can see the design in this Jira may affect the constraint label one. The constraint label could be some server attribute reported by NM, it could be required to be predefined in RM, but if we can allow NM to define something not defined in RM, and then RM automatically add it in label repository, it will be great. for example, for OS version or JDK version, customer may prefer automatically added instead of adding label before use.
          Hide
          lxhfirenking Xiaohua (Victor) Liang added a comment -

          Is there a set of readily available yarn configuration files (etc/hadoop/*.xml) that I can use to do some functionality testing on this branch ?

          Show
          lxhfirenking Xiaohua (Victor) Liang added a comment - Is there a set of readily available yarn configuration files (etc/hadoop/*.xml) that I can use to do some functionality testing on this branch ?
          Hide
          grey Lei Guo added a comment - - edited

          Varun Vasudev, some thoughts I just commented in YARN-4793.

          As this Jira targets to build a unified interface, I'd like to share some thoughts related to the resource modeling part. The core of Yarn is still to map the resource and workload. For the resource modeling proposed in YARN-3926, it extends the current Yarn static resource modeling to be a flat resource modeling. The end user has the potential to define/schedule their own resource. I am considering whether we should do further extension to make the resource modeling to be a hierarchy based modeling. The use case I see for future is the heterogenous environment with different hardware accelerators (GPU, Intel Xeon Phi, FPGA, etc). For example, if you treat one GPU as a unit of special resource, the flat resource modeling is good enough. But we are seeing cases that GPU to be shared between applications, even the application prefer to allocate certain range of memory inside GPU to avoid cache rotation issue. In this case, it's hard for scheduler to handle. There is relationship between resource (just like the relationship between applications in Slider). Scheduler must allocate GPU memory and GPU core on the same GPU.

          If we do have vision to cover more complicate environments with Yarn, maybe it's time to consider further extension on the resource modeling together with Slider integration and unified service API.

          Show
          grey Lei Guo added a comment - - edited Varun Vasudev , some thoughts I just commented in YARN-4793 . As this Jira targets to build a unified interface, I'd like to share some thoughts related to the resource modeling part. The core of Yarn is still to map the resource and workload. For the resource modeling proposed in YARN-3926 , it extends the current Yarn static resource modeling to be a flat resource modeling. The end user has the potential to define/schedule their own resource. I am considering whether we should do further extension to make the resource modeling to be a hierarchy based modeling. The use case I see for future is the heterogenous environment with different hardware accelerators (GPU, Intel Xeon Phi, FPGA, etc). For example, if you treat one GPU as a unit of special resource, the flat resource modeling is good enough. But we are seeing cases that GPU to be shared between applications, even the application prefer to allocate certain range of memory inside GPU to avoid cache rotation issue. In this case, it's hard for scheduler to handle. There is relationship between resource (just like the relationship between applications in Slider). Scheduler must allocate GPU memory and GPU core on the same GPU. If we do have vision to cover more complicate environments with Yarn, maybe it's time to consider further extension on the resource modeling together with Slider integration and unified service API.
          Hide
          asuresh Arun Suresh added a comment -

          Varun Vasudev, can you maybe update the status on this..
          I would really love to see this feature get into trunk.. The outstanding JIRA's IMHO do not seem to be a blocker to get the feature in. Happy to help pushing thru any remaining tasks.

          Show
          asuresh Arun Suresh added a comment - Varun Vasudev , can you maybe update the status on this.. I would really love to see this feature get into trunk.. The outstanding JIRA's IMHO do not seem to be a blocker to get the feature in. Happy to help pushing thru any remaining tasks.
          Hide
          vvasudev Varun Vasudev added a comment -

          My apologies for the delay Arun Suresh. There are 5 items I would like to get done before the merge -
          1. Add support for profiles in distributed shell
          2. Add support for profiles in MR
          3. Fix the resource accounting(chargeback) to take care of all resource types
          4. Do a SLS test and publish results on performance
          5. Rebase to the latest trunk

          Out of these (1) has been reviewed and I need to update the patch. (2) has to be carried out. (3) - I have a patch I'm working on which I should publish soon.

          If all goes well, we should be in a shape to close out the items soon.

          Show
          vvasudev Varun Vasudev added a comment - My apologies for the delay Arun Suresh . There are 5 items I would like to get done before the merge - 1. Add support for profiles in distributed shell 2. Add support for profiles in MR 3. Fix the resource accounting(chargeback) to take care of all resource types 4. Do a SLS test and publish results on performance 5. Rebase to the latest trunk Out of these (1) has been reviewed and I need to update the patch. (2) has to be carried out. (3) - I have a patch I'm working on which I should publish soon. If all goes well, we should be in a shape to close out the items soon.
          Hide
          asuresh Arun Suresh added a comment -

          Do you have a JIRA for this ?:

          Fix the resource accounting(chargeback) to take care of all resource types

          I assume that is different from YARN-5589.

          Show
          asuresh Arun Suresh added a comment - Do you have a JIRA for this ?: Fix the resource accounting(chargeback) to take care of all resource types I assume that is different from YARN-5589 .
          Hide
          tangzhankun Zhankun Tang added a comment -

          As mentioned above, the overrides will only be allowed for memory and cpu.
          We also propose a config flag to enable/disable overrides(set to enable overrides by default)
          using which admins can turn off the override behaviour.

          Varun Vasudev, I have doubt on that in the design doc. Could you please explain the reason why we only allow cpu and memory overriding in resource profile?
          How about we make it by default and provide configurations/options to switch on/off the resource that can be override?
          I'd like to help on this if we need this.

          Show
          tangzhankun Zhankun Tang added a comment - As mentioned above, the overrides will only be allowed for memory and cpu. We also propose a config flag to enable/disable overrides(set to enable overrides by default) using which admins can turn off the override behaviour. Varun Vasudev , I have doubt on that in the design doc. Could you please explain the reason why we only allow cpu and memory overriding in resource profile? How about we make it by default and provide configurations/options to switch on/off the resource that can be override? I'd like to help on this if we need this.
          Hide
          leftnoteasy Wangda Tan added a comment -

          This feature is merged to trunk (3.1.0). Thanks everybody for helping this feature, especially thanks Varun Vasudev for leading and driving the feature development from the beginning.

          Just moved all pending items to YARN-7069 and mark this one as resolved.

          Show
          leftnoteasy Wangda Tan added a comment - This feature is merged to trunk (3.1.0). Thanks everybody for helping this feature, especially thanks Varun Vasudev for leading and driving the feature development from the beginning. Just moved all pending items to YARN-7069 and mark this one as resolved.

            People

            • Assignee:
              vvasudev Varun Vasudev
              Reporter:
              vvasudev Varun Vasudev
            • Votes:
              1 Vote for this issue
              Watchers:
              71 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development