RollingBatchRestartRsAction doesn't handle failure cases when tracking its list of dead servers. The original author believed that a failure to restart would result in a retry. However, by removing the dead server from the failed list prematurely, that state is lost, and retry of that server never occurs. Because this action doesn't ever look back to the current state of the cluster, relying only on its local state for the current action invocation, it never realizes the abandoned server is still dead. Instead, be more careful to only remove the dead server from the list when the startRs invocation claims to have been successful.