HBase
  1. HBase
  2. HBASE-5196

Failure in region split after PONR could cause region hole

    Details

    • Hadoop Flags:
      Reviewed

      Description

      If region split fails after PONR, it relies on the master ServerShutdown handler to fix it. However, if the master doesn't get a chance to fix it. There will be a hole in the region chain.

      1. 5196-v2.txt
        6 kB
        Ted Yu
      2. hbase-5196_0.90.txt
        6 kB
        Jimmy Xiang

        Activity

        Hide
        Jimmy Xiang added a comment -

        I have a simple fix. When the master starts up, fix up all the missing daughters as the ServerShutdown handler does.

        Show
        Jimmy Xiang added a comment - I have a simple fix. When the master starts up, fix up all the missing daughters as the ServerShutdown handler does.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/
        -----------------------------------------------------------

        Review request for hbase.

        Summary
        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.
        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs


        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084
        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8
        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing
        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4363
        -----------------------------------------------------------

        +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy.

        • Michael

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4363 ----------------------------------------------------------- +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy. Michael On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-01-13 19:18:26, Michael Stack wrote:

        > +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy.

        There are only 3 threads to do the clean up. If there are lots of (most in the cluster) region servers died, the shutdown handler may stuck in log splitting for quite sometime. During this period,
        if the master died somehow, it won't be able to finish the clean up. In my case, I ran testLoadAndVerify and it brings the HDFS down to knee. So I restart the cluster and
        end up with lots of holes in the region chain.

        • Jimmy

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4363
        -----------------------------------------------------------

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-01-13 19:18:26, Michael Stack wrote: > +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy. There are only 3 threads to do the clean up. If there are lots of (most in the cluster) region servers died, the shutdown handler may stuck in log splitting for quite sometime. During this period, if the master died somehow, it won't be able to finish the clean up. In my case, I ran testLoadAndVerify and it brings the HDFS down to knee. So I restart the cluster and end up with lots of holes in the region chain. Jimmy ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4363 ----------------------------------------------------------- On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4364
        -----------------------------------------------------------

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        <https://reviews.apache.org/r/3488/#comment9813>

        Should read 'parents found. See if we can fix any'

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        <https://reviews.apache.org/r/3488/#comment9814>

        If an enum is returned, we can get three counters which would be used in the log statement below.

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        <https://reviews.apache.org/r/3488/#comment9812>

        I prefer an enum here:
        daughter not missing,
        daughter missing and fixed,
        daughter missing but not fixed

        • Ted

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4364 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3488/#comment9813 > Should read 'parents found. See if we can fix any' src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3488/#comment9814 > If an enum is returned, we can get three counters which would be used in the log statement below. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java < https://reviews.apache.org/r/3488/#comment9812 > I prefer an enum here: daughter not missing, daughter missing and fixed, daughter missing but not fixed Ted On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-01-13 19:18:26, Michael Stack wrote:

        > +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy.

        Jimmy Xiang wrote:

        There are only 3 threads to do the clean up. If there are lots of (most in the cluster) region servers died, the shutdown handler may stuck in log splitting for quite sometime. During this period,

        if the master died somehow, it won't be able to finish the clean up. In my case, I ran testLoadAndVerify and it brings the HDFS down to knee. So I restart the cluster and

        end up with lots of holes in the region chain.

        Makes sense.

        • Michael

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4363
        -----------------------------------------------------------

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-01-13 19:18:26, Michael Stack wrote: > +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy. Jimmy Xiang wrote: There are only 3 threads to do the clean up. If there are lots of (most in the cluster) region servers died, the shutdown handler may stuck in log splitting for quite sometime. During this period, if the master died somehow, it won't be able to finish the clean up. In my case, I ran testLoadAndVerify and it brings the HDFS down to knee. So I restart the cluster and end up with lots of holes in the region chain. Makes sense. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4363 ----------------------------------------------------------- On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-01-13 19:26:42, Ted Yu wrote:

        > src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java, line 366

        > <https://reviews.apache.org/r/3488/diff/1/?file=68852#file68852line366>

        >

        > I prefer an enum here:

        > daughter not missing,

        > daughter missing and fixed,

        > daughter missing but not fixed

        I'd say that if you are interested, look in logs?

        I think we should get the basic patch in first. Can do the fancy stuff in another issue?

        • Michael

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4364
        -----------------------------------------------------------

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-01-13 19:26:42, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java, line 366 > < https://reviews.apache.org/r/3488/diff/1/?file=68852#file68852line366 > > > I prefer an enum here: > daughter not missing, > daughter missing and fixed, > daughter missing but not fixed I'd say that if you are interested, look in logs? I think we should get the basic patch in first. Can do the fancy stuff in another issue? Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4364 ----------------------------------------------------------- On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        Ted Yu added a comment -

        Please allow me to attach a patch with the enum.

        Show
        Ted Yu added a comment - Please allow me to attach a patch with the enum.
        Hide
        Ted Yu added a comment -

        Pardon me for the somehow misleading comments.

        This simple patch I believe conveys the correct number of daughter regions fixed.

        Show
        Ted Yu added a comment - Pardon me for the somehow misleading comments. This simple patch I believe conveys the correct number of daughter regions fixed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12510518/5196-v2.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -146 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
        org.apache.hadoop.hbase.master.TestSplitLogManager

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/754//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/754//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/754//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510518/5196-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -146 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/754//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/754//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/754//console This message is automatically generated.
        Hide
        stack added a comment -

        Ted version is fine by me. Jimmy?

        Show
        stack added a comment - Ted version is fine by me. Jimmy?
        Hide
        Jimmy Xiang added a comment -

        Yes, it is good. Thanks Ted.

        These failed tests passed on my box.

        Show
        Jimmy Xiang added a comment - Yes, it is good. Thanks Ted. These failed tests passed on my box.
        Hide
        Ted Yu added a comment -

        Integrated to 0.92 and TRUNK.

        Thanks for the patch, Jimmy.

        Thanks for the review, Stack.

        Show
        Ted Yu added a comment - Integrated to 0.92 and TRUNK. Thanks for the patch, Jimmy. Thanks for the review, Stack.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2632 (See https://builds.apache.org/job/HBase-TRUNK/2632/)
        HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2632 (See https://builds.apache.org/job/HBase-TRUNK/2632/ ) HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #243 (See https://builds.apache.org/job/HBase-0.92/243/)
        HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #243 (See https://builds.apache.org/job/HBase-0.92/243/ ) HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3488/#review4383
        -----------------------------------------------------------

        Looks good to me

        • Lars

        On 2012-01-13 19:11:36, Jimmy Xiang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3488/

        -----------------------------------------------------------

        (Updated 2012-01-13 19:11:36)

        Review request for hbase.

        Summary

        -------

        When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does.

        This addresses bug HBASE-5196.

        https://issues.apache.org/jira/browse/HBASE-5196

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084

        src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8

        src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff

        Diff: https://reviews.apache.org/r/3488/diff

        Testing

        -------

        I test the fix in my real cluster and it does fix the problem.

        I am working on a unit test now.

        Thanks,

        Jimmy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4383 ----------------------------------------------------------- Looks good to me Lars On 2012-01-13 19:11:36, Jimmy Xiang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/ ----------------------------------------------------------- (Updated 2012-01-13 19:11:36) Review request for hbase. Summary ------- When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. This addresses bug HBASE-5196 . https://issues.apache.org/jira/browse/HBASE-5196 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff Diff: https://reviews.apache.org/r/3488/diff Testing ------- I test the fix in my real cluster and it does fix the problem. I am working on a unit test now. Thanks, Jimmy
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #78 (See https://builds.apache.org/job/HBase-TRUNK-security/78/)
        HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #78 (See https://builds.apache.org/job/HBase-TRUNK-security/78/ ) HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #76 (See https://builds.apache.org/job/HBase-0.92-security/76/)
        HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #76 (See https://builds.apache.org/job/HBase-0.92-security/76/ ) HBASE-5196 Failure in region split after PONR could cause region hole (Jimmy Xiang) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java
        Hide
        Jonathan Hsieh added a comment -

        this was committed by Ted. Thanks Jimmy!

        Show
        Jonathan Hsieh added a comment - this was committed by Ted. Thanks Jimmy!
        Hide
        Todd Lipcon added a comment -

        Should this also be committed to the 0.90 branch?

        Show
        Todd Lipcon added a comment - Should this also be committed to the 0.90 branch?
        Hide
        Jimmy Xiang added a comment -

        I attached a patch for 0.90 branch: hbase-5196_0.90.txt

        Could anyone please check it in?

        Show
        Jimmy Xiang added a comment - I attached a patch for 0.90 branch: hbase-5196_0.90.txt Could anyone please check it in?
        Hide
        Ted Yu added a comment -

        @Jimmy:
        Have you run 0.90 test suite over the new patch ?

        Show
        Ted Yu added a comment - @Jimmy: Have you run 0.90 test suite over the new patch ?
        Hide
        Jimmy Xiang added a comment -

        @Ted, I ran the test suite, and verified the fix on CDH3u3.
        Let me run the test suite on 0.90 now.

        Show
        Jimmy Xiang added a comment - @Ted, I ran the test suite, and verified the fix on CDH3u3. Let me run the test suite on 0.90 now.
        Hide
        Jimmy Xiang added a comment -

        Yes, the test suite on 0.90 with the patch passed.

        Show
        Jimmy Xiang added a comment - Yes, the test suite on 0.90 with the patch passed.
        Hide
        Ted Yu added a comment -

        Integrated to 0.90 branch.

        Thanks for the patch, Jimmy.

        Show
        Ted Yu added a comment - Integrated to 0.90 branch. Thanks for the patch, Jimmy.

          People

          • Assignee:
            Jimmy Xiang
            Reporter:
            Jimmy Xiang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development