Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
WS currently uses a few different formulae to determine the status (up/down) for the head node daemons.
Database: DB Query of CKPT data fails on-demand for the Daemons page (or corresponding JSON data).
Other daemons are down when elapsed time since last publication was received by WS is exceeded according to the formula: Number * Ratio * Rate as shown below.
Broker: 3 * ducc.ws.state.publish.rate
PM: 3 * ducc.pm.state.publish.rate
OR: 3 * ducc.orchestrator.state.publish.rate
SM: 3 * ducc.orchestrator.state.publish.rate
RM: 3 * ducc.rm.state.publish.ratio * ducc.orchestrator.state.publish.rate
The new design calls for a single value specified in ducc.properties, which applies to all head node daemons except DB:
- The elapsed time in milliseconds between monitored head-node daemons' publications
- that if exceeded indicates "down". Default = 120000 (two minutes).
ducc.ws.monitored.daemon.down.millis.expiry=120000