Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20991 MTTR
  3. HBASE-20992

MTTR, Chaos, and ITBLL

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: integration tests, MTTR
    • Labels:
      None

      Description

      I've been having trouble getting a sustained, large ITBLL run to complete over the last few days. I'm seeing a bunch of the below:

      • A region splits or is moved
      • Chaos kills the Master in the middle of the Split or Move Procedure after a Region has been offlined
      • Master takes a while to come back whether because it is not started until a couple of minutes have passed and then there is some recovery to be done.

      So a region can be offline for minutes. Default we retry up to 16 times which ends up at about 2.5 minutes before we give up.

      So, I can up the retries when running larger tests but also, the region should come back online faster.

      Let me hang ITBLL fixes/notes off here.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stack Michael Stack
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: