HBase
  1. HBase
  2. HBASE-6228

Fixup daughters twice cause daughter region assigned twice

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      First, how fixup daughters twice happen?
      1.we will fixupDaughters at the last of HMaster#finishInitialization
      2.ServerShutdownHandler will fixupDaughters when reassigning region through ServerShutdownHandler#processDeadRegion

      When fixupDaughters, we will added daughters to .META., but it coudn't prevent the above case, because FindDaughterVisitor.

      The detail is as the following:
      Suppose region A is a splitted parent region, and its daughter region B is missing

      1.First, ServerShutdownHander thread fixup daughter, so add daughter region B to .META. with serverName=null, and assign the daughter.

      2.Then, Master's initialization thread will also find the daughter region B is missing and assign it. It is because FindDaughterVisitor consider daughter is missing if its serverName=null

      1. HBASE-6228v4.patch
        9 kB
        ramkrishna.s.vasudevan
      2. HBASE-6228v3.patch
        7 kB
        chunhui shen
      3. HBASE-6228v2.patch
        0.9 kB
        chunhui shen
      4. HBASE-6228v2.patch
        1.0 kB
        chunhui shen
      5. HBASE-6228.patch
        0.8 kB
        chunhui shen

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          Jimmy Xiang added a comment - - edited

          Yes, I think the issue is not there any more.

          Show
          Jimmy Xiang added a comment - - edited Yes, I think the issue is not there any more.
          Hide
          stack added a comment -

          Jimmy Xiang and chunhui shen What about this issue. Is it fixed by hbase-6381?

          Show
          stack added a comment - Jimmy Xiang and chunhui shen What about this issue. Is it fixed by hbase-6381?
          Hide
          Jimmy Xiang added a comment -

          @Chunhui, I posted a patch for HBASE-6381 at RB: https://reviews.apache.org/r/6535/
          I could not block SSH till the master fixed missing daughters. However, I tried to
          make sure the master doesn't compete with SSH. Please take a look. Thanks.

          Show
          Jimmy Xiang added a comment - @Chunhui, I posted a patch for HBASE-6381 at RB: https://reviews.apache.org/r/6535/ I could not block SSH till the master fixed missing daughters. However, I tried to make sure the master doesn't compete with SSH. Please take a look. Thanks.
          Hide
          Jimmy Xiang added a comment -

          I'd like to fix this in HBASE-6381 by making sure SSH blocks on AM.processServerShutdown until the master has joined the cluster, and fixed missing daughters.

          Show
          Jimmy Xiang added a comment - I'd like to fix this in HBASE-6381 by making sure SSH blocks on AM.processServerShutdown until the master has joined the cluster, and fixed missing daughters.
          Hide
          Jimmy Xiang added a comment -

          @Chunhui, this is a very rare case. Is it better to synchronize SSH.isDaughterMissing()? This method is not in critical path, so there should be no performance issue.

          Show
          Jimmy Xiang added a comment - @Chunhui, this is a very rare case. Is it better to synchronize SSH.isDaughterMissing()? This method is not in critical path, so there should be no performance issue.
          Hide
          stack added a comment -

          For example, on study:

          + RegionState is unreliable figuring state of region in master's memory; you cannot rely on it to answer the bigger question of who a region belongs to: master or regionserver.
          + In assign, we are careful with retries. We actually need to be more careful especially around things like socket timeout (see Maryann's recent issue). Bulk assign does none of these checks. Bulk assign was introduced originally to do assigns on cluster start; if anything failed, contract was we'd just crash out and restart cluster over. That was how it was originally. Now bulk assign is used all over – e.g. in SSH – in spite of its being loosey-goosey around failures.

          Show
          stack added a comment - For example, on study: + RegionState is unreliable figuring state of region in master's memory; you cannot rely on it to answer the bigger question of who a region belongs to: master or regionserver. + In assign, we are careful with retries. We actually need to be more careful especially around things like socket timeout (see Maryann's recent issue). Bulk assign does none of these checks. Bulk assign was introduced originally to do assigns on cluster start; if anything failed, contract was we'd just crash out and restart cluster over. That was how it was originally. Now bulk assign is used all over – e.g. in SSH – in spite of its being loosey-goosey around failures.
          Hide
          stack added a comment -

          I've been looking at hbase-6060 as a background task (sorry, its taking me a while Ram and Rajesh to get back to you lot). When I put together multiple threads (SSH, HMaster joining cluster, single vs bulk assigning/unassign, timeout monitor, zk callbacks etc.) and then try to trace state changes not only across multiple state keepers (RegionState, RegionInTransition, AM#this.regions and AM#this.servers) in the master process but then also too x-process master -> regionserver -> via zk, I want to throw out what we have and start over (smile). That ain't going to happen though.

          Meantime I think we need to identify patterns or practices and broadcast them so all can sign on. For example, I appreciate stuff like Jimmy's small win simplifying AM breaking out RegionStates into a standalone class apart from AM. This at least collects a bunch of in-memory state in the one place.

          We also need to have more tests I'd say so we can have some confidence stuff still works when we shift things around.

          Show
          stack added a comment - I've been looking at hbase-6060 as a background task (sorry, its taking me a while Ram and Rajesh to get back to you lot). When I put together multiple threads (SSH, HMaster joining cluster, single vs bulk assigning/unassign, timeout monitor, zk callbacks etc.) and then try to trace state changes not only across multiple state keepers (RegionState, RegionInTransition, AM#this.regions and AM#this.servers) in the master process but then also too x-process master -> regionserver -> via zk, I want to throw out what we have and start over (smile). That ain't going to happen though. Meantime I think we need to identify patterns or practices and broadcast them so all can sign on. For example, I appreciate stuff like Jimmy's small win simplifying AM breaking out RegionStates into a standalone class apart from AM. This at least collects a bunch of in-memory state in the one place. We also need to have more tests I'd say so we can have some confidence stuff still works when we shift things around.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jon
          Thanks for sharing the list.

          (possibly a ZK "regionlock")

          Yes Jon. I agree. Still we keep getting issues in AM, SSH, Master restart etc.

          Show
          ramkrishna.s.vasudevan added a comment - @Jon Thanks for sharing the list. (possibly a ZK "regionlock") Yes Jon. I agree. Still we keep getting issues in AM, SSH, Master restart etc.
          Hide
          Jonathan Hsieh added a comment - - edited

          @Ram, @Chunhui

          I think stack may have started looking at some of these things but let me give a list that I've been looking at which I think are related: (there are probably more).

          HBASE-6012, HBASE-6060, HBASE-5914, HBASE-5882, HBASE-6147, HBASE-5916, HBASE-5546, HBASE-5816, HBASE-6160, HBASE-5918, HBASE-6016, HBASE-5927.

          I've been gathering the list with the intent of seeing if there is a common pattern. At the highest level, I think that we need somethign to protect meta from changes coming from multiple uncoordinated sources like RS's and HM. My gut is that the long term solution is something like a region lock (possibly a ZK "regionlock") that is used to isolate and protect hbase metadata modifications. Something like that would reduce the space of possible problems and hopefully make testing easier along the way.

          Show
          Jonathan Hsieh added a comment - - edited @Ram, @Chunhui I think stack may have started looking at some of these things but let me give a list that I've been looking at which I think are related: (there are probably more). HBASE-6012 , HBASE-6060 , HBASE-5914 , HBASE-5882 , HBASE-6147 , HBASE-5916 , HBASE-5546 , HBASE-5816 , HBASE-6160 , HBASE-5918 , HBASE-6016 , HBASE-5927 . I've been gathering the list with the intent of seeing if there is a common pattern. At the highest level, I think that we need somethign to protect meta from changes coming from multiple uncoordinated sources like RS's and HM. My gut is that the long term solution is something like a region lock (possibly a ZK "regionlock") that is used to isolate and protect hbase metadata modifications. Something like that would reduce the space of possible problems and hopefully make testing easier along the way.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          We are able to reproduce the double assignment scenario between SSH and master restart.
          So i think we should have synchronized block or as Jon says we need some lock mechanism but that may be a bigger task.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui We are able to reproduce the double assignment scenario between SSH and master restart. So i think we should have synchronized block or as Jon says we need some lock mechanism but that may be a bigger task.
          Hide
          chunhui shen added a comment -

          @ram
          Could you talk about sync mechanism?
          Based on the current patch, I think we won't assign twice. If need, we could add a synchronized block for the following:

          isDaughterMissing(catalogTracker, daughter)) {
          LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
          MetaEditor.addDaughter(catalogTracker, daughter, null);
          

          What do you think?

          Show
          chunhui shen added a comment - @ram Could you talk about sync mechanism? Based on the current patch, I think we won't assign twice. If need, we could add a synchronized block for the following: isDaughterMissing(catalogTracker, daughter)) { LOG.info( "Fixup; missing daughter " + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null ); What do you think?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533642/HBASE-6228v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533642/HBASE-6228v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2273//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jon
          I have added a testcase to show the problem.

          but my general impression is that I cannot easily tell if this patch (and several similar to it) are just pushing a race from one place to another.

          Could you let us know what type of issues are like that, if you have some list with you?

          @Chunhui
          What if the master initialization and SSH both are in fixUpDaughters? Then we are bound to get double assignment right?
          The scenario am telling is Master has just completed joinCluster(). By the time a RS went down while splitting . SSH starts and it comes to processDeadRegion.
          Later master also sees there are daughters to fixup. Now we may end up in same problem? I think here we need some sync mechanism. What do you feel Chunhui?

          Show
          ramkrishna.s.vasudevan added a comment - @Jon I have added a testcase to show the problem. but my general impression is that I cannot easily tell if this patch (and several similar to it) are just pushing a race from one place to another. Could you let us know what type of issues are like that, if you have some list with you? @Chunhui What if the master initialization and SSH both are in fixUpDaughters? Then we are bound to get double assignment right? The scenario am telling is Master has just completed joinCluster(). By the time a RS went down while splitting . SSH starts and it comes to processDeadRegion. Later master also sees there are daughters to fixup. Now we may end up in same problem? I think here we need some sync mechanism. What do you feel Chunhui?
          Hide
          chunhui shen added a comment -

          @ram
          Do you run the case in trunk? Maybe case is not good, but I run it and always failed without patch.

          update the region plan we cannot assure that double assignment can be done

          If we restart master after fixup daughter, master won't contain any plans, so I just simulate it and ensure case could happen.

          Another thing, I think I must say:
          If we send open region RPC twice for one region, but not assigned twice because we have many protection through version. But we also have risk of data correctness.
          Why?:
          When region in opening, it will replay log edits and flush; if two regionservers do opening region together, could we ensure data correctness?

          Show
          chunhui shen added a comment - @ram Do you run the case in trunk? Maybe case is not good, but I run it and always failed without patch. update the region plan we cannot assure that double assignment can be done If we restart master after fixup daughter, master won't contain any plans, so I just simulate it and ensure case could happen. Another thing, I think I must say: If we send open region RPC twice for one region, but not assigned twice because we have many protection through version. But we also have risk of data correctness. Why?: When region in opening, it will replay log edits and flush; if two regionservers do opening region together, could we ensure data correctness?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533624/HBASE-6228v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533624/HBASE-6228v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2272//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          Thanks for the testcase. I ran the testcase without patch. It passes many times even with out the fix.

                          RegionTransition rt = RegionTransition.parseFrom(data);
                          if (rt.getEventType() == EventHandler.EventType.RS_ZK_REGION_OPENED) {
                            // Make a new region plan
                            master.getAssignmentManager().getRegionPlan(daughterARegionState,
                                daughterARegionState.getServerName(), true);
                            return;
                          }
          

          This part of code may not work right. Because the node state can be changed before that Thread.sleep is done.
          And so if we don't update the region plan we cannot assure that double assignment can be done. Even i was trying to write a testcase and was caught up in a similar state. Good on you Chunhui.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Thanks for the testcase. I ran the testcase without patch. It passes many times even with out the fix. RegionTransition rt = RegionTransition.parseFrom(data); if (rt.getEventType() == EventHandler.EventType.RS_ZK_REGION_OPENED) { // Make a new region plan master.getAssignmentManager().getRegionPlan(daughterARegionState, daughterARegionState.getServerName(), true ); return ; } This part of code may not work right. Because the node state can be changed before that Thread.sleep is done. And so if we don't update the region plan we cannot assure that double assignment can be done. Even i was trying to write a testcase and was caught up in a similar state. Good on you Chunhui.
          Hide
          chunhui shen added a comment -

          Adding test case for this issue in the patch v3

          Show
          chunhui shen added a comment - Adding test case for this issue in the patch v3
          Hide
          Jonathan Hsieh added a comment -

          I'm -0. (Not going to block if others are ok with it, but am just uncomfortable since there are no tests).

          I've only recently started spending time looking at the recovery/bugs/races but my general impression is that I cannot easily tell if this patch (and several similar to it) are just pushing a race from one place to another. If we add tests then we can detect the fact that this change doesn't re-introduce a previously solved problem.

          I haven't thought out the locking idea yet but it seems if we have state races, locks could simplify the reasoning that may eliminate classes of subtle bugs.

          Show
          Jonathan Hsieh added a comment - I'm -0. (Not going to block if others are ok with it, but am just uncomfortable since there are no tests). I've only recently started spending time looking at the recovery/bugs/races but my general impression is that I cannot easily tell if this patch (and several similar to it) are just pushing a race from one place to another. If we add tests then we can detect the fact that this change doesn't re-introduce a previously solved problem. I haven't thought out the locking idea yet but it seems if we have state races, locks could simplify the reasoning that may eliminate classes of subtle bugs.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jon
          So you not ok with this fix Jon?

          Show
          ramkrishna.s.vasudevan added a comment - @Jon So you not ok with this fix Jon?
          Hide
          Jonathan Hsieh added a comment -

          Is there anyway we can add tests to these subtle recovery fixes?

          Part of me says we should just take a something like lock on the region (in zk, possibly moving it into RIT) before we start fixing them up like this to make this obvious and to eliminate these classes of races.

          Show
          Jonathan Hsieh added a comment - Is there anyway we can add tests to these subtle recovery fixes? Part of me says we should just take a something like lock on the region (in zk, possibly moving it into RIT) before we start fixing them up like this to make this obvious and to eliminate these classes of races.
          Hide
          ramkrishna.s.vasudevan added a comment -

          +1 Chunhui. Your explanation is right. Sorry for making noise here.
          Thanks.

          Show
          ramkrishna.s.vasudevan added a comment - +1 Chunhui. Your explanation is right. Sorry for making noise here. Thanks.
          Hide
          chunhui shen added a comment -

          @ram

          Now when trying to fix daughter it will find the the parent server name associated with it and it will not assign it right.

          I think it won't happen.

          If master goes down before the assignment is done and after fixup daughters, when master restart, it will assign the daughter through rebuildUserRegions because the added parent server name is a dead server.

          Show
          chunhui shen added a comment - @ram Now when trying to fix daughter it will find the the parent server name associated with it and it will not assign it right. I think it won't happen. If master goes down before the assignment is done and after fixup daughters, when master restart, it will assign the daughter through rebuildUserRegions because the added parent server name is a dead server.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          Going thro the patch again.
          Now you are trying to update the parent region server name into the daughter. Now assume a region that was split and still not daughters are not added to META.
          Now the RS goes down. SSH will start and find the missing daughter. Now you try to add the parent server name to the daughter. Before the assignment is done master goes down.
          Now when trying to fix daughter it will find the the parent server name associated with it and it will not assign it right.
          So i think here what we should try is to avoid two assignments.
          What do you feel Chunhui?
          In the previous patch because the check if the server was null is commented the problem is almost same but in a different way. Correct me if am wrong Chunhui.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Going thro the patch again. Now you are trying to update the parent region server name into the daughter. Now assume a region that was split and still not daughters are not added to META. Now the RS goes down. SSH will start and find the missing daughter. Now you try to add the parent server name to the daughter. Before the assignment is done master goes down. Now when trying to fix daughter it will find the the parent server name associated with it and it will not assign it right. So i think here what we should try is to avoid two assignments. What do you feel Chunhui? In the previous patch because the check if the server was null is commented the problem is almost same but in a different way. Correct me if am wrong Chunhui.
          Hide
          Ted Yu added a comment -

          +1 on patch v2.

          Show
          Ted Yu added a comment - +1 on patch v2.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12532668/HBASE-6228v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12532668/HBASE-6228v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2200//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          Reattaching patch v2

          Show
          chunhui shen added a comment - Reattaching patch v2
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12532666/HBASE-6228v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2199//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12532666/HBASE-6228v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2199//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          Updating patch: added daughter with parent region's serverName when fixup daughters

          Show
          chunhui shen added a comment - Updating patch: added daughter with parent region's serverName when fixup daughters
          Hide
          chunhui shen added a comment -

          So if only SSH is going on and there is no master restart scenario how this patch will work?

          In current logic, we will add daughter to META with serverName in the SplitTransaction. And we won't added daughter with serverName==null except fixup daughter.
          However, master may die after added daughter but before assign it, and master won't assign it when restart

          So I think we should added daughter with parent's serverName.

          Thanks ram to find the hole

          Show
          chunhui shen added a comment - So if only SSH is going on and there is no master restart scenario how this patch will work? In current logic, we will add daughter to META with serverName in the SplitTransaction. And we won't added daughter with serverName==null except fixup daughter. However, master may die after added daughter but before assign it, and master won't assign it when restart So I think we should added daughter with parent's serverName. Thanks ram to find the hole
          Hide
          ramkrishna.s.vasudevan added a comment -

          So as i see the code if in the normal case where SSH tries to fix up, if SSH sees there is no servername i.e the servername is null, it will try to fixup daughter and assign it.
          As i see the patch now we are trying to remove code where we get ServerName as null. So if only SSH is going on and there is no master restart scenario how this patch will work? Sorry if my question is dumb?

          Show
          ramkrishna.s.vasudevan added a comment - So as i see the code if in the normal case where SSH tries to fix up, if SSH sees there is no servername i.e the servername is null, it will try to fixup daughter and assign it. As i see the patch now we are trying to remove code where we get ServerName as null. So if only SSH is going on and there is no master restart scenario how this patch will work? Sorry if my question is dumb?
          Hide
          chunhui shen added a comment -

          @ram
          So this happens when master restart and SSH goes in parallel only?
          Yes, as the current logic, we only fixup daughter in the two places.

          Show
          chunhui shen added a comment - @ram So this happens when master restart and SSH goes in parallel only? Yes, as the current logic, we only fixup daughter in the two places.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          So this happens when master restart and SSH goes in parallel only?

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui So this happens when master restart and SSH goes in parallel only?
          Hide
          Ted Yu added a comment -

          Patch looks good.

          Show
          Ted Yu added a comment - Patch looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12532372/HBASE-6228.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12532372/HBASE-6228.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2185//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Submitting to HadoopQA.

          Show
          ramkrishna.s.vasudevan added a comment - Submitting to HadoopQA.

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development