Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
N.B. I don't think this bug affects IotProvider since I'm pretty sure that while IotProvider includes JobMonitorApp, it doesn't register a JobRegistryService so the monitor does nothing. JIRA forthcoming for that.
JobMonitorAppTest exercises the app but it doesn't perform any validation that restarts were actually happening. Adding instrumentation / validation highlights that 3x the number of rebuilds/restarts are happening.
appOne: buildCnt: 7 injectedFailureCnt: 2 appTwo: buildCnt: 10 injectedFailureCnt: 3
Further investigation identifies the JobMonitorApp's job event filtering as the problem. Each "failed" job ends up with 3 events that pass through the filter
RUNNING, RUNNING, UNHEALTHY RUNNING, CLOSED, UNHEALTHY CLOSED, CLOSED, UNHEALTHY
... or something like that