YARN-3943 added separate configurations for the nodemanager health check disk utilization full disk check:
max-disk-utilization-per-disk-percentage - threshold for marking a good disk full
disk-utilization-watermark-low-per-disk-percentage - threshold for marking a full disk as not full.
On our clusters, we do not use these configs. We instead use min-free-space-per-disk-mb so we can specify the limit in mb instead of percent of utilization. We have observed the same oscillation behavior as described in
YARN-3943 with this parameter. I would like to add an optional config to specify a separate threshold for marking a full disk as not full:
min-free-space-per-disk-mb - threshold at which a good disk is marked full
disk-free-space-per-disk-high-watermark-mb - threshold at which a full disk is marked good.
So for example, we could set min-free-space-per-disk-mb = 5GB, which would cause a disk to be marked full when free space goes below 5GB, and disk-free-space-per-disk-high-watermark-mb = 10GB to keep the disk in the full state until free space goes above 10GB.