Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.7.0
Description
ACCUMULO-2480 added some support to kill the tserver if HDFS is unavailable after a number of checks. ACCUMULO-3937 added some configuration values to loosen this.
We still only sleep for a static 100ms after every failure. This makes the default 15 attempts over 10 seconds a bit misleading as it will kill itself after 1.5 seconds not 10.
I'm thinking that this should really be more like a 30-60s wait period out of the box. Anything less isn't really going to insulate operators from transient HDFS failures (due to services being restarted or network partitions).
Attachments
Issue Links
- is related to
-
ACCUMULO-2480 ha fail-failover failure
- Resolved