Add the ability to scale the RM-NM heartbeat interval based on node cpu utilization compared to overall cluster cpu utilization. If a node is over-utilized compared to the rest of the cluster, it's heartbeat interval slows down. If it is under-utilized compared to the rest of the cluster, it's heartbeat interval speeds up.
This is a feature we have been running with internally in production for several years. It was developed by Nathan Roberts, based on the observation that larger faster nodes on our cluster were under-utilized compared to smaller slower nodes.
This feature is dependent on
YARN-10450, which added cluster-wide utilization metrics.