Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4353

Stabilize tablet assignment during transient failure

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      When a tablet server dies, Accumulo attempts to reassign the tablets it was hosting as quickly as possible to maintain availability. If multiple tablet servers die in quick succession, such as from a rolling restart of the Accumulo cluster or a network partition, this behavior can cause a storm of reassignment and rebalancing, placing significant load on the master.

      To avert such load, Accumulo should be capable of maintaining a steady tablet assignment state in the face of transient tablet server loss. Instead of reassigning tablets as quickly as possible, Accumulo should be await the return of a temporarily downed tablet server (for some configurable duration) before assigning its tablets to other tablet servers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ShawnWalker Shawn Walker
                Reporter:
                ShawnWalker Shawn Walker
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 50m
                  7h 50m