Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3963

Incremental backoff on inability to write to HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.7.0
    • 1.7.1, 1.8.0
    • tserver

    Description

      ACCUMULO-2480 added some support to kill the tserver if HDFS is unavailable after a number of checks. ACCUMULO-3937 added some configuration values to loosen this.

      We still only sleep for a static 100ms after every failure. This makes the default 15 attempts over 10 seconds a bit misleading as it will kill itself after 1.5 seconds not 10.

      I'm thinking that this should really be more like a 30-60s wait period out of the box. Anything less isn't really going to insulate operators from transient HDFS failures (due to services being restarted or network partitions).

      Attachments

        Issue Links

          Activity

            People

              elserj Josh Elser
              elserj Josh Elser
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m