NodeHealthCheckerService.initalize should be NodeHealthCheckerService.initialize (typo)
reportHealthStatus can be simplified, currently it is proving to be somewhat difficult to follow the logic. One option could be to do the following:
Done, introduced a new Enum HealthCheckExitStatus and doing the health report based on switch instead of nested if
Please give name to the Timer. (I've pointed this out in previous comments also)
Done. Timer is named NodeHealthMonitor-Timer
When the health checker is timed out, we should not be setting the timestamp.
What do we show on the UI if the health checker is not configured on a TT. My suggestion would be to fill in timestamp as 0 in the TTstatus and if yes, then show status as N/A and no timestamp or message.
Done, when health checker is not configured, we are setting last updated time of node health to zero. Based on that setting the health status string as N/A.
We are not getting the values of the health status (status, message and time) atomically in Tasktracker. We should probably lock on the health status service object and get these values when filling up the task tracker status. (Raised this issue earlier)
Done, introduced a method org.apache.hadoop.mapred.NodeHealthCheckerService.setHealthStatus(TaskTrackerHealthStatus) which is synchronized at object level all sets of status in NodeHealthChecker are synchronized.
InterTrackerProtocol version should be changed (because TaskTrackerStatus structure has changed)
TaskTrackerHealthStatus constructor has a constructor taking numberOfRestarts, which is not required.
On JobTracker, it looks like currently we are storing all trackers - even healthy ones in the potentiallyFaultyTrackers data structure. This is unnecessary. If we fix this, we should also ensure that when a node's fault count falls to zero and is healthy, it is removed from this structure.
Done, we are creating the FaultInfo lazily, and removing it in unBlackList based on the fault count.
The formatting in machines.jsp doesn't seem right. Can you please check it ?
Done, removed accidental tab characters.