Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2868

Make master configurable in when it kills tablet servers

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Abandoned
    • 1.6.0
    • None
    • master

    Description

      On a cluster with a flaky network, the master may be unable to contact a tserver for some moderate amount of time and then direct it to terminate, even though the tserver is still up. (See gatherTableInformation() and StatusThread. It does not appear possible to configure the master to be more forgiving in these checks. Relevant constants:

      • DEFAULT_WAIT_FOR_WATCHER - interval between server checks
      • MAX_BAD_STATUS_COUNT - the maximum number of failed attempts allowed before killing the tserver

      Making one or both of those configurable, or some other pertinent parameter configurable, would allow cluster admins to cope with mild network maladies.

      Attachments

        Activity

          Closing this stale issue. If this is still a problem, please create a new issue or PR at https://github.com/apache/accumulo

          ctubbsii Christopher Tubbs added a comment - Closing this stale issue. If this is still a problem, please create a new issue or PR at https://github.com/apache/accumulo
          mdrob Mike Drob added a comment -

          Todd outlines some more advanced logic for HDFS deciding when to mark a node as dead, rather than just X retries * Y seconds.

          mdrob Mike Drob added a comment - Todd outlines some more advanced logic for HDFS deciding when to mark a node as dead, rather than just X retries * Y seconds.

          People

            Unassigned Unassigned
            bhavanki Bill Havanki
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: