HBase
  1. HBase
  2. HBASE-4124

ZK restarted while a region is being assigned, new active HM re-assigns it but the RS warns 'already online on this server'.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.5
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.

      Issue:
      The RS failed besause of 'already online on this server' and return; The HM can not receive the message and report 'Regions in transition timed out'.

      1. log.txt
        16 kB
        fulin wang
      2. HBASE-4124_Branch90V1_trial.patch
        3 kB
        gaojinchao
      3. HBASE-4124_Branch90V2.patch
        6 kB
        gaojinchao
      4. HBASE-4124_Branch90V3.patch
        6 kB
        gaojinchao
      5. HBASE-4124_Branch90V4.patch
        6 kB
        gaojinchao
      6. HBASE-4124_TrunkV1.patch
        3 kB
        gaojinchao
      7. HBASE-4124_TrunkV2.patch
        3 kB
        gaojinchao

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2158 (See https://builds.apache.org/job/HBase-TRUNK/2158/)
        HBASE-4124 ZK restarted while a region is being assigned, new active HM re-assigns it but the RS warns 'already online on this server'

        stack :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2158 (See https://builds.apache.org/job/HBase-TRUNK/2158/ ) HBASE-4124 ZK restarted while a region is being assigned, new active HM re-assigns it but the RS warns 'already online on this server' stack : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Hide
        stack added a comment -

        Committed to TRUNK. Thank you for the patch Gaojinchao.

        Show
        stack added a comment - Committed to TRUNK. Thank you for the patch Gaojinchao.
        Hide
        gaojinchao added a comment -

        All test cases passed. Thanks.

        Show
        gaojinchao added a comment - All test cases passed. Thanks.
        Hide
        gaojinchao added a comment -

        I am runing all the test cases. My new modification is more clear.

        Show
        gaojinchao added a comment - I am runing all the test cases. My new modification is more clear.
        Hide
        Ted Yu added a comment -

        That's right.
        Waiting for decision on refactoring before committing.

        Show
        Ted Yu added a comment - That's right. Waiting for decision on refactoring before committing.
        Hide
        gaojinchao added a comment -

        @Ted thanks for your work.
        sn has checked about null above statement.

        if (sn == null)

        { LOG.warn("Region in transition " + regionInfo.getEncodedName() + " references a null server; letting RIT timeout so will be " + "assigned elsewhere"); break; }
        Show
        gaojinchao added a comment - @Ted thanks for your work. sn has checked about null above statement. if (sn == null) { LOG.warn("Region in transition " + regionInfo.getEncodedName() + " references a null server; letting RIT timeout so will be " + "assigned elsewhere"); break; }
        Hide
        Ted Yu added a comment -

        Ran through all tests based on 4124-trunk.v2, there was only one test failure:

        Failed tests:   testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:<\x00\x00\x00\xE0> but was:<\x00\x00\x00\xDE>
        

        I think the above should be due to HBASE-3855

        Show
        Ted Yu added a comment - Ran through all tests based on 4124-trunk.v2, there was only one test failure: Failed tests: testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:<\x00\x00\x00\xE0> but was:<\x00\x00\x00\xDE> I think the above should be due to HBASE-3855
        Hide
        Ted Yu added a comment -

        This patch added null check for sn.

        According to http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html, null cannot be used as key.

        Show
        Ted Yu added a comment - This patch added null check for sn. According to http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html , null cannot be used as key.
        Hide
        Ted Yu added a comment -

        Can we refactor the repeating code out to method and call that?

        This would produce a difference between the 0.90 branch and TRUNK, right ?
        Should we defer this refactoring to another JIRA ?

        I agree with the comment about null pointer check.

        Show
        Ted Yu added a comment - Can we refactor the repeating code out to method and call that? This would produce a difference between the 0.90 branch and TRUNK, right ? Should we defer this refactoring to another JIRA ? I agree with the comment about null pointer check.
        Hide
        gaojinchao added a comment -

        @Stack. Thanks for your review.
        I'll be ready to catch a plane, I will modify according to your opinion back to Shenzhen.

        Show
        gaojinchao added a comment - @Stack. Thanks for your review. I'll be ready to catch a plane, I will modify according to your opinion back to Shenzhen.
        Hide
        stack added a comment -

        Patch is basically good but a bunch of code is repeated. Can we refactor the repeating code out to method and call that?

        For example:

        +        if (isOnDeadServer(regionInfo, deadServers)
        +            && (null == data.getOrigin() || !serverManager.isServerOnline(data
        +                .getOrigin()))) {
        

        This is repeated three times. One of the repeats does not check for null and it seems like it should.

        Also, I'd write the above as to make it more readable (just saying...)

          if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null ||
              !serverManager.isServerOnline(data.getOrigin()))) {
        

        Otherwise patch makes sense. Nice one Gao

        Show
        stack added a comment - Patch is basically good but a bunch of code is repeated. Can we refactor the repeating code out to method and call that? For example: + if (isOnDeadServer(regionInfo, deadServers) + && ( null == data.getOrigin() || !serverManager.isServerOnline(data + .getOrigin()))) { This is repeated three times. One of the repeats does not check for null and it seems like it should. Also, I'd write the above as to make it more readable (just saying...) if (isOnDeadServer(regionInfo, deadServers) && (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) { Otherwise patch makes sense. Nice one Gao
        Hide
        gaojinchao added a comment -

        Thanks for Ted.
        Test case has passed.

        -------------------------------------------------------
        T E S T S
        -------------------------------------------------------

        -------------------------------------------------------
        T E S T S
        -------------------------------------------------------
        Running org.apache.hadoop.hbase.client.TestAdmin

        Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 358.001 sec

        Results :

        Tests run: 28, Failures: 0, Errors: 0, Skipped: 0

        Show
        gaojinchao added a comment - Thanks for Ted. Test case has passed. ------------------------------------------------------- T E S T S ------------------------------------------------------- ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 358.001 sec Results : Tests run: 28, Failures: 0, Errors: 0, Skipped: 0
        Hide
        gaojinchao added a comment -

        I have made a patch. I found two test case(TestAdmin and RollLoging) can't pass. I use the raw trunk as well

        Show
        gaojinchao added a comment - I have made a patch. I found two test case(TestAdmin and RollLoging) can't pass. I use the raw trunk as well
        Hide
        Ted Yu added a comment -

        The code you looked at came from HBASE-4083
        I don't think it is a bug.

        Please provide the patch for TRUNK.

        Show
        Ted Yu added a comment - The code you looked at came from HBASE-4083 I don't think it is a bug. Please provide the patch for TRUNK.
        Hide
        gaojinchao added a comment -

        @ Ted
        I am making a patch for TRUNK. But I have some questions about TRUNK.
        It seems a bug.
        In function assign, when we get the return value "ALREADY_OPENED" .
        should we update the meta table ? or we do this on region server.

        hmaster code:
        RegionOpeningState regionOpenState = serverManager.sendRegionOpen(plan
        .getDestination(), state.getRegion());
        if (regionOpenState == RegionOpeningState.ALREADY_OPENED) {

        region server code: if we don't update the meta ,the client may access to the old server.

        HRegion onlineRegion = this.getFromOnlineRegions(region.getEncodedName());
        if (null != onlineRegion)

        { LOG.warn("Attempted open of " + region.getEncodedName() + " but already online on this server"); return RegionOpeningState.ALREADY_OPENED; }
        Show
        gaojinchao added a comment - @ Ted I am making a patch for TRUNK. But I have some questions about TRUNK. It seems a bug. In function assign, when we get the return value "ALREADY_OPENED" . should we update the meta table ? or we do this on region server. hmaster code: RegionOpeningState regionOpenState = serverManager.sendRegionOpen(plan .getDestination(), state.getRegion()); if (regionOpenState == RegionOpeningState.ALREADY_OPENED) { region server code: if we don't update the meta ,the client may access to the old server. HRegion onlineRegion = this.getFromOnlineRegions(region.getEncodedName()); if (null != onlineRegion) { LOG.warn("Attempted open of " + region.getEncodedName() + " but already online on this server"); return RegionOpeningState.ALREADY_OPENED; }
        Hide
        Ted Yu added a comment -

        Patch integrated to 0.90 branch after wrapping long lines.

        Thanks for the patch Jinchao.

        Show
        Ted Yu added a comment - Patch integrated to 0.90 branch after wrapping long lines. Thanks for the patch Jinchao.
        Hide
        Ted Yu added a comment -

        @Jinchao:
        Can you prepare patch for TRUNK as well ?

        Show
        Ted Yu added a comment - @Jinchao: Can you prepare patch for TRUNK as well ?
        Hide
        Ted Yu added a comment -

        +1 on patch v4.
        Minor comment:
        A few lines such as the following are longer than 80 characters:

        +            (null == data.getServerName() || !serverManager.isServerOnline(data.getServerName()))) {
        
        Show
        Ted Yu added a comment - +1 on patch v4. Minor comment: A few lines such as the following are longer than 80 characters: + ( null == data.getServerName() || !serverManager.isServerOnline(data.getServerName()))) {
        Hide
        gaojinchao added a comment -

        According to review, modified the comments.
        Thanks for Ted's careful review.

        Show
        gaojinchao added a comment - According to review, modified the comments. Thanks for Ted's careful review.
        Hide
        ramkrishna.s.vasudevan added a comment -
        {bq}

        .sorry.step 3: startup master again .
        This statement confused me a bit.
        Thanks for your explanation.

        Show
        ramkrishna.s.vasudevan added a comment - {bq} .sorry.step 3: startup master again . This statement confused me a bit. Thanks for your explanation.
        Hide
        Ted Yu added a comment -

        +1 on patch version 3.

        Show
        Ted Yu added a comment - +1 on patch version 3.
        Hide
        gaojinchao added a comment -

        @ram
        How come we have a dead RS if we dont kill the RS

        gao: If you stop the cluster, The meta will handle the server information.

        if the master is also killed how can the regions be assigned to some other RS

        gao: When master startup, it collects the regions on a same region server and
        call sendRegionOpen(destination, regions).
        If the region is relatively large number, when region server opens the reigons needs a long time.
        when master crash, the new master may reopen the regions on another region server.

        Show
        gaojinchao added a comment - @ram How come we have a dead RS if we dont kill the RS gao: If you stop the cluster, The meta will handle the server information. if the master is also killed how can the regions be assigned to some other RS gao: When master startup, it collects the regions on a same region server and call sendRegionOpen(destination, regions). If the region is relatively large number, when region server opens the reigons needs a long time. when master crash, the new master may reopen the regions on another region server.
        Hide
        gaojinchao added a comment -

        @Ted
        I have run all the tests. Thanks for your work.

        Show
        gaojinchao added a comment - @Ted I have run all the tests. Thanks for your work.
        Hide
        Ted Yu added a comment -

        All tests passed with patch v3 for 0.90 branch.

        Show
        Ted Yu added a comment - All tests passed with patch v3 for 0.90 branch.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Gao
        Correct me if am wrong. I can understand the intention behind the logic.

        +          RegionTransitionData data = ZKAssign.getData(watcher, regionInfo.getEncodedName()); 
        +          
        +          //When zk node has been updated by a living server, we consider that this region server is handling it. 
        +          //So we should skip it and process it in processRegionsInTransition.
        +          if (data != null && data.getServerName() != null &&
        +            serverManager.isServerOnline(data.getServerName())){
        +              LOG.info("The region " + regionInfo.getEncodedName() +
        +                "is processing by " + data.getServerName());
        +            continue;
        +          }
        

        But if as part of rebuildUserRegions() the master finds a server to be dead and adds those RS to dead servers and also u said the master was killed.
        How come we have a dead RS if we dont kill the RS and if the master is also killed how can the regions be assigned to some other RS (how can the state change in ZK for that region node).
        May be am not understanding something. If you can explain this it will help me in Timeoutmonitor.
        Rest looks fine.

        Show
        ramkrishna.s.vasudevan added a comment - @Gao Correct me if am wrong. I can understand the intention behind the logic. + RegionTransitionData data = ZKAssign.getData(watcher, regionInfo.getEncodedName()); + + //When zk node has been updated by a living server, we consider that this region server is handling it. + //So we should skip it and process it in processRegionsInTransition. + if (data != null && data.getServerName() != null && + serverManager.isServerOnline(data.getServerName())){ + LOG.info( "The region " + regionInfo.getEncodedName() + + "is processing by " + data.getServerName()); + continue ; + } But if as part of rebuildUserRegions() the master finds a server to be dead and adds those RS to dead servers and also u said the master was killed. How come we have a dead RS if we dont kill the RS and if the master is also killed how can the regions be assigned to some other RS (how can the state change in ZK for that region node). May be am not understanding something. If you can explain this it will help me in Timeoutmonitor. Rest looks fine.
        Hide
        Ted Yu added a comment -

        Once patch v3 receives +1 vote, a patch for TRUNK should be made.
        Thanks for the effort.

        Show
        Ted Yu added a comment - Once patch v3 receives +1 vote, a patch for TRUNK should be made. Thanks for the effort.
        Hide
        gaojinchao added a comment -

        @Ted
        Does it need a patch for Trunk?
        There is a big change, I need some time to study it.

        Show
        gaojinchao added a comment - @Ted Does it need a patch for Trunk? There is a big change, I need some time to study it.
        Hide
        Ted Yu added a comment -

        HBASE-4124_Branch90V2.patch makes sense.
        Please correct grammar in javadocs.

        Show
        Ted Yu added a comment - HBASE-4124 _Branch90V2.patch makes sense. Please correct grammar in javadocs.
        Hide
        gaojinchao added a comment -

        RS isn't dead. I can reproduce and verify it.

        ZK status has changed before adding to RIT set. You can look the function processDeadServers.
        That is the reason why a region is assigned twice.

        // If region was in transition (was in zk) force it offline for reassign
        try {
        //Process with existing RS shutdown code
        boolean assign =
        ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
        this.catalogTracker);
        if (assign)

        { ZKAssign.createOrForceNodeOffline(watcher, regionInfo, master.getServerName()); }
        Show
        gaojinchao added a comment - RS isn't dead. I can reproduce and verify it. ZK status has changed before adding to RIT set. You can look the function processDeadServers. That is the reason why a region is assigned twice. // If region was in transition (was in zk) force it offline for reassign try { //Process with existing RS shutdown code boolean assign = ServerShutdownHandler.processDeadRegion(regionInfo, result, this, this.catalogTracker); if (assign) { ZKAssign.createOrForceNodeOffline(watcher, regionInfo, master.getServerName()); }
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Gao

        {bq}

        step 3: startup master again .

        As per the scenario you have described when the master restarted the RS has it opened the region? I think the scenario here is RS is also dead.
        If so the assignment manager will try assigning it to a new RS. Do you think any problem here?
        If the RS is alive then the znode status will be OPENED state and the processRIT will take care of clearing the node as it is already opened. Could be be more clear on the state of RS after you killed the master and also on the state of znode in zookeeper for that region.

        Show
        ramkrishna.s.vasudevan added a comment - @Gao {bq} step 3: startup master again . As per the scenario you have described when the master restarted the RS has it opened the region? I think the scenario here is RS is also dead. If so the assignment manager will try assigning it to a new RS. Do you think any problem here? If the RS is alive then the znode status will be OPENED state and the processRIT will take care of clearing the node as it is already opened. Could be be more clear on the state of RS after you killed the master and also on the state of znode in zookeeper for that region.
        Hide
        gaojinchao added a comment -

        I have added a test case for opening a region.

        Show
        gaojinchao added a comment - I have added a test case for opening a region.
        Hide
        gaojinchao added a comment -

        sorry.step 3: startup master again .

        Show
        gaojinchao added a comment - sorry.step 3: startup master again .
        Hide
        gaojinchao added a comment -

        I have finished the test. I discribe the scene:
        step 1: startup cluster
        step 2: abort the master when finish call "sendRegionOpen(destination, regions)"
        step 3: startup cluster again.

        above steps will reproduce the issue.
        when master is failover. the meta records the dead server,but the region is processing for a living region server.

        Show
        gaojinchao added a comment - I have finished the test. I discribe the scene: step 1: startup cluster step 2: abort the master when finish call "sendRegionOpen(destination, regions)" step 3: startup cluster again. above steps will reproduce the issue. when master is failover. the meta records the dead server,but the region is processing for a living region server.
        Hide
        gaojinchao added a comment -

        I try to make a patch and fix this issue.
        But I only run the UT test. Please review it firstly and give me some suggestion. I will test it tomorrow. Thanks.

        Show
        gaojinchao added a comment - I try to make a patch and fix this issue. But I only run the UT test. Please review it firstly and give me some suggestion. I will test it tomorrow. Thanks.
        Hide
        fulin wang added a comment -

        Please gaojinchao fix the issues, Thanks.

        Show
        fulin wang added a comment - Please gaojinchao fix the issues, Thanks.
        Hide
        fulin wang added a comment -

        I can't find where does it call getRegionsInTransitionInRS().add()? So I do not understand why add this function.
        About 'already online on this server' of error, I want that the region should be closed or reassinged. I am trying to make a patch.

        Show
        fulin wang added a comment - I can't find where does it call getRegionsInTransitionInRS().add()? So I do not understand why add this function. About 'already online on this server' of error, I want that the region should be closed or reassinged. I am trying to make a patch.
        Hide
        stack added a comment -

        hbase-3741 changes the behavior here in that now we notice if we are asked to open a region that is already open and we'll throw an exception back to the master. I think the master will now reassign it elsewhere which is not what we want if its a RegionAlreadyInTransitionException. This will make it so we'll not keep retrying but I think there is more to do.

        Show
        stack added a comment - hbase-3741 changes the behavior here in that now we notice if we are asked to open a region that is already open and we'll throw an exception back to the master. I think the master will now reassign it elsewhere which is not what we want if its a RegionAlreadyInTransitionException. This will make it so we'll not keep retrying but I think there is more to do.
        Hide
        fulin wang added a comment -

        The error log.

        Show
        fulin wang added a comment - The error log.

          People

          • Assignee:
            gaojinchao
            Reporter:
            fulin wang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.4h
              0.4h
              Remaining:
              Remaining Estimate - 0.4h
              0.4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development