Add the ability to scale the RM-NM heartbeat interval based on node cpu utilization compared to overall cluster cpu utilization. If a node is over-utilized compared to the rest of the cluster, it's heartbeat interval slows down. If it is under-utilized compared to the rest of the cluster, it's heartbeat interval speeds up.
This is a feature we have been running with internally in production for several years. It was developed by nroberts, based on the observation that larger faster nodes on our cluster were under-utilized compared to smaller slower nodes.
This feature is dependent on
YARN-10450, which added cluster-wide utilization metrics.