Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3159

Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • None
    • Reviewed

    Description

      Here is master log with annotations: http://people.apache.org/~stack/master.txt

      Region in question is:

      b8827a67a9d446f345095d25e1f375f7

      The running code is doctored in that I've added in a bit of logging – zk in particular – and I've also removed what I thought was a provocation of this condition, reassign inside in an assign if server has gone away when we try the open rpc (Turns out we have the condition even w/o this code in place).

      The log starts where the region in question timesout in RIT.

      We assign it to 186.

      Notice how we see 'Handling transition' for this region TWICE. This means two OpenedRegionHandlers will be scheduled – and so the failure to delete a znode already gone.

      As best I can tell, the watcher for this region is triggered once only – which is odd because how then the double scheduling of OpenedRegionHandler but also, why am I not seeing OPENING, OPENING, OPENED and only what I presume is an OPENED?

      Attachments

        1. TestRollingRestart-v4.patch
          25 kB
          Jonathan Gray
        2. rs_death_on_meta_open_no_root.txt
          14 kB
          Jonathan Gray
        3. master-root-assign-abort.log
          15 kB
          Jonathan Gray
        4. hbase-meta-dupe-opened-master-only.txt
          8 kB
          Jonathan Gray
        5. hbase-meta-dupe-opened.txt
          15 kB
          Jonathan Gray
        6. HBASE-3159-FINAL.patch
          30 kB
          Jonathan Gray

        Activity

          People

            streamy Jonathan Gray
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: