Currently, with CPU cgroup strict mode enabled on NodeManager, when cpu resources are overcommitted ( 8 vCores on 4 core machine), the total amount of CPU time that container will get for each requested vCore will be automatically downscaled with the formula: vCoreCPUTime = totalPhysicalCoresOnNM / coresConfiguredForNM. So container speed will be throttled on CPU even if there are spare cores available on NM (e.g with 8 vCores available o 4 core machine, a container that asked for 2 cores effectively will be allowed to use only on physical core). The same is happening if CPU resource cap is enabled (via yarn.nodemanager.resource.percentage-physical-cpu-limit), in this case, totalCoresOnNode (=coresOnNode * percentage-physical-cpu-limit) is scaled down by the cap. So for example, if the cap is 80%, a container that asked for 2 cores will be allowed to use the max of the equivalent of 1.6 physical core, regardless of the current NM load.
Both aforementioned situations may lead to underuse of available resources. In some cases, administrator may want to overcommit the resources if applications are statically over-allocating resources, but not fully using them. This will cause all containers to slow down, which is not the initial intention.
Therefore it would be very useful if administrators have control on how vCores are mapped to CPU time on NodeManagers in strict mode when CPU resources are overcommitted or/and physical-cpu-limit is enabled.
This could be potentially done with a parameter like yarn.nodemanager.resource.strict-vcore-weight that controls the vCore to pCore time mapping. E.g value 1 means one to one mapping, 1.2 means that a single vcore can have up to 120% of a physical core (this can be handy for pysparkers), -1 (default) disables the feature - use auto-scaling.