Thanks for creating the proposal, Varun! Some quick comments after a brief review:
xinclude is a simple solution for supporting both a monolithic yarn-site.xml or a separate file if we stick with the Configuration-based approach. Code loads yarn-site.xml but users can always separate out chunks of it and xinclude them. We do this quite a bit with our configs internally.
As for RM and NM config mismatches, there can always be a problem where the RM is configured to understand resources A, B, and C while the nodemanager is configured to provide A, B and D. Handshaking during NM registration seems the appropriate way to mitigate this possibility, although I'm not sure it's necessary to shutdown the NM if it is providing a superset of what the RM schedules. Reading later in the doc it appears this is actually intended to be supported by adding it to NMs then later the RM for rolling upgrades, but earlier it states that any mismatch, even additional resources, is fatal to NM registration. That needs to be cleaned up.
A little confused why the sample xml config has mappings of pf1,pf2, etc. to profile names rather than using the profile names in the config properties directly like is done with the concise format examples later. For example, couldn't it be simplified to:
That being said I think the sample configs at the end, particularly the json form or potentially a yaml version, would be a welcome sight for those trying to setup and grok the configs.
The sample config in the beginning has a typo, yarn.nodemanager.resource-types.cpu s/b yarn.nodemanager.resource-types.cpu.name.
Overall seems like a reasonable approach to make handling of resource types data driven. I have some performance concerns on the memory footprint impact of adding a Map to every resource and needing to hash/compare strings every time we try to do any computations on it. The scheduler loop is already too slow, and this looks like it could add significant overhead to it. Hopefully we can mitigate that if it does become a concern, e.g.: translating Resource records coming across the wire into an efficient internal representation optimized for the resource types configured.