HBase
  1. HBase
  2. HBASE-3159

Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Here is master log with annotations: http://people.apache.org/~stack/master.txt

      Region in question is:

      b8827a67a9d446f345095d25e1f375f7

      The running code is doctored in that I've added in a bit of logging – zk in particular – and I've also removed what I thought was a provocation of this condition, reassign inside in an assign if server has gone away when we try the open rpc (Turns out we have the condition even w/o this code in place).

      The log starts where the region in question timesout in RIT.

      We assign it to 186.

      Notice how we see 'Handling transition' for this region TWICE. This means two OpenedRegionHandlers will be scheduled – and so the failure to delete a znode already gone.

      As best I can tell, the watcher for this region is triggered once only – which is odd because how then the double scheduling of OpenedRegionHandler but also, why am I not seeing OPENING, OPENING, OPENED and only what I presume is an OPENED?

      1. TestRollingRestart-v4.patch
        25 kB
        Jonathan Gray
      2. rs_death_on_meta_open_no_root.txt
        14 kB
        Jonathan Gray
      3. master-root-assign-abort.log
        15 kB
        Jonathan Gray
      4. hbase-meta-dupe-opened-master-only.txt
        8 kB
        Jonathan Gray
      5. hbase-meta-dupe-opened.txt
        15 kB
        Jonathan Gray
      6. HBASE-3159-FINAL.patch
        30 kB
        Jonathan Gray

        Activity

        Hide
        Jonathan Gray added a comment -

        Committed to trunk. Thanks stack for working with me on this, reviewing, and being awesome at breaking stuff

        Show
        Jonathan Gray added a comment - Committed to trunk. Thanks stack for working with me on this, reviewing, and being awesome at breaking stuff
        Hide
        Jonathan Gray added a comment -

        Patch committed.

        Show
        Jonathan Gray added a comment - Patch committed.
        Hide
        HBase Review Board added a comment -

        Message from: stack@duboce.net

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.cloudera.org/r/1108/
        -----------------------------------------------------------

        Review request for hbase and Jonathan Gray.

        Summary
        -------

        Here is patch I've been testing with up on cluster. Adds debugging and two fixes – one setting state to OPEN on receipt of a rs opened event and two, resetting a watcher getting data over in zkutil (as per its documentation claims).

        This addresses bug hbase-3159.
        http://issues.apache.org/jira/browse/hbase-3159

        Diffs


        src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 30e49c8
        src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1a88700
        src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java d4fa82b
        src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 81661ef
        src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 21a4256

        Diff: http://review.cloudera.org/r/1108/diff

        Testing
        -------

        Running on cluster

        Thanks,

        stack

        Show
        HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1108/ ----------------------------------------------------------- Review request for hbase and Jonathan Gray. Summary ------- Here is patch I've been testing with up on cluster. Adds debugging and two fixes – one setting state to OPEN on receipt of a rs opened event and two, resetting a watcher getting data over in zkutil (as per its documentation claims). This addresses bug hbase-3159. http://issues.apache.org/jira/browse/hbase-3159 Diffs src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 30e49c8 src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1a88700 src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java d4fa82b src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 81661ef src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 21a4256 Diff: http://review.cloudera.org/r/1108/diff Testing ------- Running on cluster Thanks, stack
        Hide
        Jonathan Gray added a comment -

        Another issue during TestRollingRestart.

        When RS is opening META, it goes to update ROOT location.

        In the code, it seems like it's supposed to wait indefinitely for a ROOT location before proceeding with the edit for the new meta location. But in the log it hardly waits at all. And seemingly returns a null location, which it should not if there's no root location.

        In CT.getCachedConnection(HServerAddress) it looks like we will return null in many different instances (and not even log that a connection exception happened). Somehow we're returning true from the blocking calls waiting on root, but then when we go to verify it doesn't work?

        Show
        Jonathan Gray added a comment - Another issue during TestRollingRestart. When RS is opening META, it goes to update ROOT location. In the code, it seems like it's supposed to wait indefinitely for a ROOT location before proceeding with the edit for the new meta location. But in the log it hardly waits at all. And seemingly returns a null location, which it should not if there's no root location. In CT.getCachedConnection(HServerAddress) it looks like we will return null in many different instances (and not even log that a connection exception happened). Somehow we're returning true from the blocking calls waiting on root, but then when we go to verify it doesn't work?
        Hide
        Jonathan Gray added a comment -

        Interesting failure w/ root assignment. Stack, I think you're better to look at this one.

        Show
        Jonathan Gray added a comment - Interesting failure w/ root assignment. Stack, I think you're better to look at this one.
        Hide
        Jonathan Gray added a comment -

        Probably not related but just uncovered a small race condition in AssignmentManager around line 806 in assign(RegionState):

              // Send OPEN RPC. This can fail if the server on other end is is not up.
              serverManager.sendRegionOpen(plan.getDestination(), state.getRegion());
              // Transition RegionState to PENDING_OPEN
              state.update(RegionState.State.PENDING_OPEN);
        

        We need to update the state to PENDING_OPEN before we send the RPC. Otherwise we could get the OPENING and we'll still be in OFFLINE state locally so we will reject the transition.

        Show
        Jonathan Gray added a comment - Probably not related but just uncovered a small race condition in AssignmentManager around line 806 in assign(RegionState): // Send OPEN RPC. This can fail if the server on other end is is not up. serverManager.sendRegionOpen(plan.getDestination(), state.getRegion()); // Transition RegionState to PENDING_OPEN state.update(RegionState.State.PENDING_OPEN); We need to update the state to PENDING_OPEN before we send the RPC. Otherwise we could get the OPENING and we'll still be in OFFLINE state locally so we will reject the transition.
        Hide
        Jonathan Gray added a comment -

        This is my latest patch doing unit tests of rolling restarts. Has some logging changes and a couple small fixes.

        Also includes the non-fix fix I described earlier but it's commented out so we can still detect this broken state.

        Show
        Jonathan Gray added a comment - This is my latest patch doing unit tests of rolling restarts. Has some logging changes and a couple small fixes. Also includes the non-fix fix I described earlier but it's commented out so we can still detect this broken state.
        Hide
        Jonathan Gray added a comment -

        Stack, when you rerun your tests again, turn off the ZK client logging and ensure all of our ZK logging is set to pickup DEBUG level. There's always a slight chance we would want the raw ZK client logs, if something really crazy is happening, but there should be enough logging in our ZKW and ZKUtil as long as we pick up debug.

        One thing though, change the method at the bottom of ZKUtil to the following:

          private static void logRetrievedMsg(final ZooKeeperWatcher zkw,
              final String znode, final byte [] data, final boolean watcherSet) {
            if (!LOG.isDebugEnabled()) return;
            LOG.debug(zkw.prefix("Retrieved " + ((data == null)? 0: data.length) +
              " byte(s) of data from znode " + znode +
              (watcherSet? " and set watcher; ": "; data=") +
              (data == null? "null": (
                  znode.startsWith(zkw.assignmentZNode) ?
                      RegionTransitionData.fromBytes(data).toString()
                      : StringUtils.abbreviate(Bytes.toString(data), 32)))));
          }
        

        The change is that we detect if we're logging an unassigned znode, and if so, we print the region transition data. This will make debugging this much simpler.

        Show
        Jonathan Gray added a comment - Stack, when you rerun your tests again, turn off the ZK client logging and ensure all of our ZK logging is set to pickup DEBUG level. There's always a slight chance we would want the raw ZK client logs, if something really crazy is happening, but there should be enough logging in our ZKW and ZKUtil as long as we pick up debug. One thing though, change the method at the bottom of ZKUtil to the following: private static void logRetrievedMsg(final ZooKeeperWatcher zkw, final String znode, final byte [] data, final boolean watcherSet) { if (!LOG.isDebugEnabled()) return; LOG.debug(zkw.prefix("Retrieved " + ((data == null)? 0: data.length) + " byte(s) of data from znode " + znode + (watcherSet? " and set watcher; ": "; data=") + (data == null? "null": ( znode.startsWith(zkw.assignmentZNode) ? RegionTransitionData.fromBytes(data).toString() : StringUtils.abbreviate(Bytes.toString(data), 32))))); } The change is that we detect if we're logging an unassigned znode, and if so, we print the region transition data. This will make debugging this much simpler.
        Hide
        Jonathan Gray added a comment -

        Previous files were a bit borked. These should not be missing lines.

        Show
        Jonathan Gray added a comment - Previous files were a bit borked. These should not be missing lines.
        Hide
        Jonathan Gray added a comment -

        This is fairly easy to "fix" in a way that will not make the master abort. But this does not get to the underlying cause of triggered to OPENED handlers. There's something going on there that we need to keep digging on (I'm doing so now but with added logging it's not happening anymore).

        The fix to prevent abort is to transition the in-memory RIT to the OPEN state when we handleRegion(regionTransitionData). Exactly like we're already doing in the CLOSED handling in that method, we need to do this:

        + regionState.update(RegionState.State.OPEN, data.getStamp());

        When the next attempted handle of the OPENED state comes in, we won't process it because it's not in the expected states of PENDING_OPEN or OPENING, and then the closed handler won't be executed.

        But yeah, let's not put that fix in yet. Seems like some screwy with ZK watches being fired though we don't expect a watch to be set or something. Digging more...

        Show
        Jonathan Gray added a comment - This is fairly easy to "fix" in a way that will not make the master abort. But this does not get to the underlying cause of triggered to OPENED handlers. There's something going on there that we need to keep digging on (I'm doing so now but with added logging it's not happening anymore). The fix to prevent abort is to transition the in-memory RIT to the OPEN state when we handleRegion(regionTransitionData). Exactly like we're already doing in the CLOSED handling in that method, we need to do this: + regionState.update(RegionState.State.OPEN, data.getStamp()); When the next attempted handle of the OPENED state comes in, we won't process it because it's not in the expected states of PENDING_OPEN or OPENING, and then the closed handler won't be executed. But yeah, let's not put that fix in yet. Seems like some screwy with ZK watches being fired though we don't expect a watch to be set or something. Digging more...
        Hide
        Jonathan Gray added a comment -

        Got a double OPENED of meta on the master running my new TestRollingRestart test which does lots of RS killing.

        Attaching two logs. One is raw log from section where problem happened, the second is just the master part (this is unit test so RS+master combined).

        Show
        Jonathan Gray added a comment - Got a double OPENED of meta on the master running my new TestRollingRestart test which does lots of RS killing. Attaching two logs. One is raw log from section where problem happened, the second is just the master part (this is unit test so RS+master combined).

          People

          • Assignee:
            Jonathan Gray
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development