Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Abandoned
-
None
-
None
-
None
Description
- A NodeHealthManager module aims to monitor the health states of nodes including machine, and task manager workload.
- It's beyond the general blacklist machenism since blacklist is an extreme case.
- It provides runtime metrics to the scheduler in JM who can then more such as load balance, bad TMs skipping, and even slots scoring.