Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Abandoned
-
1.6.0
-
None
Description
On a cluster with a flaky network, the master may be unable to contact a tserver for some moderate amount of time and then direct it to terminate, even though the tserver is still up. (See gatherTableInformation() and StatusThread. It does not appear possible to configure the master to be more forgiving in these checks. Relevant constants:
- DEFAULT_WAIT_FOR_WATCHER - interval between server checks
- MAX_BAD_STATUS_COUNT - the maximum number of failed attempts allowed before killing the tserver
Making one or both of those configurable, or some other pertinent parameter configurable, would allow cluster admins to cope with mild network maladies.
Closing this stale issue. If this is still a problem, please create a new issue or PR at https://github.com/apache/accumulo