[ACCUMULO-2868] Make master configurable in when it kills tablet servers - ASF JIRA

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Abandoned
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: master
Labels:

Description

On a cluster with a flaky network, the master may be unable to contact a tserver for some moderate amount of time and then direct it to terminate, even though the tserver is still up. (See gatherTableInformation() and StatusThread. It does not appear possible to configure the master to be more forgiving in these checks. Relevant constants:

DEFAULT_WAIT_FOR_WATCHER - interval between server checks
MAX_BAD_STATUS_COUNT - the maximum number of failed attempts allowed before killing the tserver

Making one or both of those configurable, or some other pertinent parameter configurable, would allow cluster admins to cope with mild network maladies.

Attachments

Activity

Descending order - Click to sort in ascending order

Christopher Tubbs added a comment - 02/Nov/22 19:33

Closing this stale issue. If this is still a problem, please create a new issue or PR at https://github.com/apache/accumulo

Christopher Tubbs added a comment - 02/Nov/22 19:33 Closing this stale issue. If this is still a problem, please create a new issue or PR at https://github.com/apache/accumulo

Mike Drob added a comment - 19/Jun/14 15:40

Todd outlines some more advanced logic for HDFS deciding when to mark a node as dead, rather than just X retries * Y seconds.

Mike Drob added a comment - 19/Jun/14 15:40 Todd outlines some more advanced logic for HDFS deciding when to mark a node as dead, rather than just X retries * Y seconds.

People

Assignee:: Unassigned

Reporter:: Bill Havanki

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Jun/14 19:06

Updated:: 02/Nov/22 19:33

Resolved:: 02/Nov/22 19:33

Accumulo