Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3419

If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.90.0, 0.92.0
    • 0.90.1
    • regionserver, Zookeeper
    • None
    • Reviewed

    Description

      The Progressable used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be "normal" for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open.

      We had a cluster trip over some other issue (for some reason, the tickle was not happening in < 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster.

      Attachments

        1. HBASE-3419-v3-TRUNK.patch
          11 kB
          Jonathan Gray
        2. HBASE-3419-v3.patch
          12 kB
          Jonathan Gray
        3. HBASE-3419-v2.patch
          13 kB
          Jonathan Gray
        4. HBASE-3419-v1.patch
          12 kB
          Jonathan Gray

        Activity

          People

            streamy Jonathan Gray
            streamy Jonathan Gray
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: