Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.0-alpha2
-
None
Description
Done as a alternate design to YARN-5567. Define a specific exit code for the health checker script (property yarn.nodemanager.health-checker.script.path) that allows the node to be blacklisted.
As discussed in the latter part of YARN-5567, the current design requirements are:
- Ignore all exit codes from the script
- except the newly defined error code which will mark the NodeManager as UNHEALTHY
- This allows any syntax or functional errors in the script to be ignored
- Upon failure (or multiple recorded failures):
- Store the status in the metrics2 state on the NodeManager
- Allow the RM to blacklist the NM or allow the jobs to drain
Attachments
Issue Links
- duplicates
-
YARN-5635 Better handling when bad script is configured as Node's HealthScript
- Resolved