HBase
  1. HBase
  2. HBASE-5875

Process RIT and Master restart may remove an online server considering it as a dead server

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.1
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT.

      Master will trigger the assignment and next will try to verify the Root Region Location.
      Root region location verification is done seeing if the RS has the region in its online list.
      If the master triggered assignment has not yet been completed in RS then the verify root region location will fail.
      Because it failed

      splitLogAndExpireIfOnline(currentRootServer);
      

      we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted.

      So master, though the server is online, master just invalidates the region server.
      In a special case, if i have only one RS then my cluster will become non operative.

      1. HBASE-5875_0.94_1.patch
        17 kB
        ramkrishna.s.vasudevan
      2. HBASE-5875_0.94_2.patch
        4 kB
        ramkrishna.s.vasudevan
      3. HBASE-5875_0.94.patch
        17 kB
        ramkrishna.s.vasudevan
      4. HBASE-5875_trunk_1.patch
        3 kB
        ramkrishna.s.vasudevan
      5. HBASE-5875_trunk.patch
        3 kB
        rajeshbabu
      6. HBASE-5875_trunk.patch
        3 kB
        rajeshbabu
      7. HBASE-5875.patch
        2 kB
        ramkrishna.s.vasudevan
      8. HBASE-5875v2.patch
        3 kB
        chunhui shen

        Activity

        Hide
        Lars Hofhansl added a comment -

        Can we move this to 0.94.1?

        Show
        Lars Hofhansl added a comment - Can we move this to 0.94.1?
        Hide
        ramkrishna.s.vasudevan added a comment - - edited

        Updated to 0.94.1.

        {Edit} I will come up with a patch in another couple of days. {Edit}
        Show
        ramkrishna.s.vasudevan added a comment - - edited Updated to 0.94.1. {Edit} I will come up with a patch in another couple of days. {Edit}
        Hide
        ramkrishna.s.vasudevan added a comment -

        I would like to get some suggestions in this

            boolean rit = this.assignmentManager.
              processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
            ServerName currentRootServer = null;
            if (!catalogTracker.verifyRootRegionLocation(timeout)) {
              currentRootServer = this.catalogTracker.getRootLocation();
        

        Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment.

        It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired. So we just remove an normal active RS from the master memory thinking it as dead. So i lose a RS itself from the master's list of online servers. How can we handle this scenario?

        Can we retry the verifyRootRegionLocation if it returns false and the boolean variable 'rit' is true?
        Or can we update the root region node in the RS side after updating the online server list? Suggestions welcome...

        Show
        ramkrishna.s.vasudevan added a comment - I would like to get some suggestions in this boolean rit = this .assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); ServerName currentRootServer = null ; if (!catalogTracker.verifyRootRegionLocation(timeout)) { currentRootServer = this .catalogTracker.getRootLocation(); Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment. It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired. So we just remove an normal active RS from the master memory thinking it as dead. So i lose a RS itself from the master's list of online servers. How can we handle this scenario? Can we retry the verifyRootRegionLocation if it returns false and the boolean variable 'rit' is true? Or can we update the root region node in the RS side after updating the online server list? Suggestions welcome...
        Hide
        Ted Yu added a comment -

        Or can we update the root region node in the RS side after updating the online server list?

        Let's try this approach first.

        The other approach would involve retry count, sleep interval, etc.

        Show
        Ted Yu added a comment - Or can we update the root region node in the RS side after updating the online server list? Let's try this approach first. The other approach would involve retry count, sleep interval, etc.
        Hide
        Jieshan Bean added a comment -

        Look into the method of CatalogTracker#verifyRootRegionLocation:

        public boolean verifyRootRegionLocation(final long timeout)
          throws InterruptedException, IOException {
            AdminProtocol connection = null;
            try {
              connection = waitForRootServerConnection(timeout);
            } catch (NotAllMetaRegionsOnlineException e) {
              // Pass
            } catch (ServerNotRunningYetException e) {
              // Pass -- remote server is not up so can't be carrying root
            } catch (UnknownHostException e) {
              // Pass -- server name doesn't resolve so it can't be assigned anything.
            }
            return (connection == null)? false:
              verifyRegionLocation(connection,
                this.rootRegionTracker.getRootRegionLocation(), ROOT_REGION_NAME);
          }
        

        I'm thinking about an approach which can handle this issue according to different exception.
        e.g. if we got an ServerNotRunningYetException, we can process splitLogAndExpireIfOnline.
        But if we got an NotServingRegionException, we should not do that.

        Show
        Jieshan Bean added a comment - Look into the method of CatalogTracker#verifyRootRegionLocation: public boolean verifyRootRegionLocation(final long timeout) throws InterruptedException, IOException { AdminProtocol connection = null; try { connection = waitForRootServerConnection(timeout); } catch (NotAllMetaRegionsOnlineException e) { // Pass } catch (ServerNotRunningYetException e) { // Pass -- remote server is not up so can't be carrying root } catch (UnknownHostException e) { // Pass -- server name doesn't resolve so it can't be assigned anything. } return (connection == null)? false: verifyRegionLocation(connection, this.rootRegionTracker.getRootRegionLocation(), ROOT_REGION_NAME); } I'm thinking about an approach which can handle this issue according to different exception. e.g. if we got an ServerNotRunningYetException, we can process splitLogAndExpireIfOnline. But if we got an NotServingRegionException, we should not do that.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Jieshan
        As Ted also suggested if we go by the exception then we need to add unnecessary retry logic, sleep time and also need to modify the api verifyRootRegionLocation which is used in many places.

        Show
        ramkrishna.s.vasudevan added a comment - @Jieshan As Ted also suggested if we go by the exception then we need to add unnecessary retry logic, sleep time and also need to modify the api verifyRootRegionLocation which is used in many places.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Patch for trunk. TestCases passed.

        Show
        ramkrishna.s.vasudevan added a comment - Patch for trunk. TestCases passed.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        Can you take a look at this? This is in relation to HBASE-4880. Pls provide your thoughts

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui Can you take a look at this? This is in relation to HBASE-4880 . Pls provide your thoughts
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525060/HBASE-5875.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525060/HBASE-5875.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//console This message is automatically generated.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Testcase failure seems unrelated to this fix.

        Show
        ramkrishna.s.vasudevan added a comment - Testcase failure seems unrelated to this fix.
        Hide
        chunhui shen added a comment -

        If the master triggered assignment has not yet been completed in RS then the verify root region location will fail.

        Why does it happend?
        In assignRootAndMeta:

        boolean rit = this.assignmentManager.
              processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
        

        We will block until master completed the assignment.

        Show
        chunhui shen added a comment - If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Why does it happend? In assignRootAndMeta: boolean rit = this .assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); We will block until master completed the assignment.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        Correct but that is still the RIT map is cleared. But the actual assignment is done after the region is added to the online map in the RS side and that is where this problem happened.
        You know that in verifyRootRegionLocation() we mainly check if the region is available on the region server online list.
        So there is still a small time gap for this to happen and this happened in our test cluster.
        Thanks for your time.

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui Correct but that is still the RIT map is cleared. But the actual assignment is done after the region is added to the online map in the RS side and that is where this problem happened. You know that in verifyRootRegionLocation() we mainly check if the region is available on the region server online list. So there is still a small time gap for this to happen and this happened in our test cluster. Thanks for your time.
        Hide
        chunhui shen added a comment -

        @ram
        I'm clear now about the time gap.

        What about do the following check

        if (assignmentManager.getRegionServerOfRegion(HRegionInfo.ROOT_REGIONINFO) == null) {
              ServerName currentRootServer = null;
              if (!catalogTracker.verifyRootRegionLocation(timeout)) {
                currentRootServer = this.catalogTracker.getRootLocation();
                splitLogAndExpireIfOnline(currentRootServer);
                this.assignmentManager.assignRoot();
                // Make sure a -ROOT- location is set.
                if (!isRootLocation())
                  return false;
                // This guarantees that the transition assigning -ROOT- has completed
                this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
                assigned++;
              } else {
                // Region already assigned. We didn't assign it. Add to in-memory state.
                this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO,
                    this.catalogTracker.getRootLocation());
              }
            } else {
              // Root region has been assigned through processRegionInTransition
            }
        
        Show
        chunhui shen added a comment - @ram I'm clear now about the time gap. What about do the following check if (assignmentManager.getRegionServerOfRegion(HRegionInfo.ROOT_REGIONINFO) == null ) { ServerName currentRootServer = null ; if (!catalogTracker.verifyRootRegionLocation(timeout)) { currentRootServer = this .catalogTracker.getRootLocation(); splitLogAndExpireIfOnline(currentRootServer); this .assignmentManager.assignRoot(); // Make sure a -ROOT- location is set. if (!isRootLocation()) return false ; // This guarantees that the transition assigning -ROOT- has completed this .assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } else { // Region already assigned. We didn't assign it. Add to in-memory state. this .assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO, this .catalogTracker.getRootLocation()); } } else { // Root region has been assigned through processRegionInTransition }
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        verifyRootRegionLocation() need not be done? What if the RS went down just after processing the znode to OPENED? So only SSH will come and try to assign root?

        I am not sure if accepting that ROOT has been assigned without verifyRootRegionLocation() is ok? But your approach is simple.

        My idea behind the patch was without ROOT and META the cluster is non operative. Hence i went with that approach. Appreciate your time.

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui verifyRootRegionLocation() need not be done? What if the RS went down just after processing the znode to OPENED? So only SSH will come and try to assign root? I am not sure if accepting that ROOT has been assigned without verifyRootRegionLocation() is ok? But your approach is simple. My idea behind the patch was without ROOT and META the cluster is non operative. Hence i went with that approach. Appreciate your time.
        Hide
        chunhui shen added a comment -

        What if the RS went down just after processing the znode to OPENED? So only SSH will come and try to assign root?

        Yes, SSH will assign root. Also it remind me to the bug HBASE-5918, would you take a see?

        With the current patch, I think there is possibility of data loss mentioned in HBASE-4880.

        My approach is just a thought, since ROOT region is online in the AssignmentManager when initializing, it must been assigned.
        However, it also has a hole where remove hregioninfo from RIT but not add the region to AssignmentManager.regions in AssignmentManager#regionOnline().

        Show
        chunhui shen added a comment - What if the RS went down just after processing the znode to OPENED? So only SSH will come and try to assign root? Yes, SSH will assign root. Also it remind me to the bug HBASE-5918 , would you take a see? With the current patch, I think there is possibility of data loss mentioned in HBASE-4880 . My approach is just a thought, since ROOT region is online in the AssignmentManager when initializing, it must been assigned. However, it also has a hole where remove hregioninfo from RIT but not add the region to AssignmentManager.regions in AssignmentManager#regionOnline().
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        Yes, i took a look at HBASE-5819.
        HBASE-5816 is also due to the serverShutdownHandlerEnabled variable. I think 'serverShutdownHandlerEnabled' usage should be more clear.

        I think that the problem of HBASE-4880 is applicable for the user regions but for root and META the updates are done by the region servers themselves and the problem of HBASE-4880 should not be there. Because if after updating the META entry to ROOT if transitioning to OPENED fails, and even if closing the region also fails, any way till the META is available the system is not going to function. Correct me if am wrong Chunhui.

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui Yes, i took a look at HBASE-5819 . HBASE-5816 is also due to the serverShutdownHandlerEnabled variable. I think 'serverShutdownHandlerEnabled' usage should be more clear. I think that the problem of HBASE-4880 is applicable for the user regions but for root and META the updates are done by the region servers themselves and the problem of HBASE-4880 should not be there. Because if after updating the META entry to ROOT if transitioning to OPENED fails, and even if closing the region also fails, any way till the META is available the system is not going to function. Correct me if am wrong Chunhui.
        Hide
        chunhui shen added a comment -

        @ram
        Thanks for the explaination, I think patch is OK for the issue.

        Show
        chunhui shen added a comment - @ram Thanks for the explaination, I think patch is OK for the issue.
        Hide
        Ted Yu added a comment -

        +1 on patch.

        Before integrating to 0.92 branch, please run test suite.

        Show
        Ted Yu added a comment - +1 on patch. Before integrating to 0.92 branch, please run test suite.
        Hide
        stack added a comment -

        The patch looks dodgy – saying a region is online, if it is root or meta, seems incorrect.

        Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment.

        What is the above referring to? Which part of the code?

        It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired.

        Can the master not detect this corner case just by looking at whats in zk?

        Show
        stack added a comment - The patch looks dodgy – saying a region is online, if it is root or meta, seems incorrect. Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment. What is the above referring to? Which part of the code? It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired. Can the master not detect this corner case just by looking at whats in zk?
        Hide
        ramkrishna.s.vasudevan added a comment - - edited

        What is the above referring to? Which part of the code?

        In assignRootAndMeta()

        boolean rit = this.assignmentManager.
              processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
        
        

        Can the master not detect this corner case just by looking at whats in zk?

        Here zk you mean the RS node or the ROOT region node?

        Show
        ramkrishna.s.vasudevan added a comment - - edited What is the above referring to? Which part of the code? In assignRootAndMeta() boolean rit = this .assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); Can the master not detect this corner case just by looking at whats in zk? Here zk you mean the RS node or the ROOT region node?
        Hide
        ramkrishna.s.vasudevan added a comment -

        I have reproduced the scenario addressing the title of the JIRA with a testcase.
        I have tried follow a approach that Bijieshan had suggested in
        https://issues.apache.org/jira/browse/HBASE-5875?focusedCommentId=13264874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13264874
        to solve the problem. Tomorrow i can upload the testcase.

        Show
        ramkrishna.s.vasudevan added a comment - I have reproduced the scenario addressing the title of the JIRA with a testcase. I have tried follow a approach that Bijieshan had suggested in https://issues.apache.org/jira/browse/HBASE-5875?focusedCommentId=13264874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13264874 to solve the problem. Tomorrow i can upload the testcase.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Attached patch is for 0.94.
        Trunk has some protobuf changes so the test case needs to be updated.
        Again just another way of trying to address this problem. Please provide your feedback.

        Show
        ramkrishna.s.vasudevan added a comment - Attached patch is for 0.94. Trunk has some protobuf changes so the test case needs to be updated. Again just another way of trying to address this problem. Please provide your feedback.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525640/HBASE-5875_0.94.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1765//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525640/HBASE-5875_0.94.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1765//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        The following change is for debugging, right ? If so, please change log level accordingly:

        +    }catch(NotServingRegionException nsre){
        +      LOG.info("Failed verification of " + Bytes.toStringBinary(regionName) +
        +          " at address=" + address + "; " + t);
        +      throw nsre;
        
        +    } catch (NotServingRegionException nsre) {
        +      if(rit == true){
        +        // the root region location is available.
        

        People unfamiliar with processRegionInTransitionAndBlockUntilAssigned() may get confused by the code above. rit actually means root region has come out of transition. So rit should be named accordingly.

        +  public void setServerShutdownHandlerEnabled(boolean setServerShutDownEnabled) {
        

        The above method should be made package-private. Append 'ForTest' to the end of method name would help clarify its purpose.

        Show
        Ted Yu added a comment - The following change is for debugging, right ? If so, please change log level accordingly: + } catch (NotServingRegionException nsre){ + LOG.info( "Failed verification of " + Bytes.toStringBinary(regionName) + + " at address=" + address + "; " + t); + throw nsre; + } catch (NotServingRegionException nsre) { + if (rit == true ){ + // the root region location is available. People unfamiliar with processRegionInTransitionAndBlockUntilAssigned() may get confused by the code above. rit actually means root region has come out of transition. So rit should be named accordingly. + public void setServerShutdownHandlerEnabled( boolean setServerShutDownEnabled) { The above method should be made package-private. Append 'ForTest' to the end of method name would help clarify its purpose.
        Hide
        ramkrishna.s.vasudevan added a comment -

        I have addressed the third comment in the recent patch.

        LOG.info("Failed verification of " + Bytes.toStringBinary(regionName) +
        +          " at address=" + address + "; " + t);
        

        This log is same as the one below it. It is an existing one.

        if(rit == true){
        

        Here rit means region in transition and it applies to META also if it is in RIT. So i think changing this name will not make it generic. Again, this is a patch for 0.94

        Show
        ramkrishna.s.vasudevan added a comment - I have addressed the third comment in the recent patch. LOG.info( "Failed verification of " + Bytes.toStringBinary(regionName) + + " at address=" + address + "; " + t); This log is same as the one below it. It is an existing one. if (rit == true ){ Here rit means region in transition and it applies to META also if it is in RIT. So i think changing this name will not make it generic. Again, this is a patch for 0.94
        Hide
        chunhui shen added a comment -

        I think we could change a litter to fix the issue.

        What about checking whether region in regionsInTransitionInRS when call getRegionInfo for verifyReionLocation? If so, it must not in other regionserver, we could wait.

        Another solution:
        We could skip verifyReionLocation if we found it in assignment map if processRegionInTransitionAndBlockUntilAssigned return true, could we ?(To be sure, we should change a little in AssignmentManager#regionOnline)

        Show
        chunhui shen added a comment - I think we could change a litter to fix the issue. What about checking whether region in regionsInTransitionInRS when call getRegionInfo for verifyReionLocation? If so, it must not in other regionserver, we could wait. Another solution: We could skip verifyReionLocation if we found it in assignment map if processRegionInTransitionAndBlockUntilAssigned return true, could we ?(To be sure, we should change a little in AssignmentManager#regionOnline)
        Hide
        ramkrishna.s.vasudevan added a comment -

        I think we could change a litter to fix the issue.

        Did you mean little?
        Can you come up with your patch based on second solution?

        What about checking whether region in regionsInTransitionInRS when call getRegionInfo for verifyReionLocation? If so, it must not in other regionserver, we could wait.

        Here am not sure again how long to wait and how much to retry?

        Show
        ramkrishna.s.vasudevan added a comment - I think we could change a litter to fix the issue. Did you mean little? Can you come up with your patch based on second solution? What about checking whether region in regionsInTransitionInRS when call getRegionInfo for verifyReionLocation? If so, it must not in other regionserver, we could wait. Here am not sure again how long to wait and how much to retry?
        Hide
        chunhui shen added a comment -

        @ram
        The v2 patch based on second solution, could you take a see.

        Show
        chunhui shen added a comment - @ram The v2 patch based on second solution, could you take a see.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Chunhui
        Thanks for the patch. I saw that. Any race is possible in regionOnline() and processServerShutdown(). Any corner case? I just thought for the scenarios where two OpenedRegionHandler call comes for the same region. I think it should be ok.
        Are all the testcases running? Good job.
        Let's see what Stack has to say for this?

        Show
        ramkrishna.s.vasudevan added a comment - @Chunhui Thanks for the patch. I saw that. Any race is possible in regionOnline() and processServerShutdown(). Any corner case? I just thought for the scenarios where two OpenedRegionHandler call comes for the same region. I think it should be ok. Are all the testcases running? Good job. Let's see what Stack has to say for this?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525969/HBASE-5875v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525969/HBASE-5875v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1795//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Chunhui's patch looks good.
        Minor comments:

        +      LOG.info("-ROOT- is already onlined after process RIT");
        +    }else{
             if (!catalogTracker.verifyRootRegionLocation(timeout)) {
        

        'process RIT' -> 'processing RIT'
        Please insert spaces around else.
        Indentation for the following statements should be increased.

        Similar comments apply to the handling of FIRST_META_REGIONINFO

        Show
        Ted Yu added a comment - Chunhui's patch looks good. Minor comments: + LOG.info( "-ROOT- is already onlined after process RIT" ); + } else { if (!catalogTracker.verifyRootRegionLocation(timeout)) { 'process RIT' -> 'processing RIT' Please insert spaces around else. Indentation for the following statements should be increased. Similar comments apply to the handling of FIRST_META_REGIONINFO
        Hide
        stack added a comment -

        On the patch, what Ted says.

        Plus, I am not sure why we avoid verifying root and meta locations? If they are online, why not do the verify?

        In AM, why move the sync block?

        I like that this patch is much smaller. Much easier to reason about (smile). Thanks lads.

        Oh, where is the test? Is it possible to make it into a unit test and include it along w/ this patch?

        Good stuff

        Show
        stack added a comment - On the patch, what Ted says. Plus, I am not sure why we avoid verifying root and meta locations? If they are online, why not do the verify? In AM, why move the sync block? I like that this patch is much smaller. Much easier to reason about (smile). Thanks lads. Oh, where is the test? Is it possible to make it into a unit test and include it along w/ this patch? Good stuff
        Hide
        rajeshbabu added a comment -

        Attached patch for trunk. Please review and provide comments/suggestions.

        Show
        rajeshbabu added a comment - Attached patch for trunk. Please review and provide comments/suggestions.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Devs

        +      // Make sure a -ROOT- location is set.
        +      if (!isRootLocation()) return false;
        +      // This guarantees that the transition assigning -ROOT- has completed
        +      this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
        +      assigned++;
        

        and

        +      // Wait until META region added to region server onlineRegions. See HBASE-5875.
        +      enableSSHandWaitForMeta();
        +      assigned++;
        

        This will ensure that we wait for ROOT and META. Now as HBASE-5918 has gone in, if any RS goes down inbetween root and META assignment SSH will also be triggered.
        The main intention in this patch is to avoid

        splitLogAndExpireIfOnline(currentRootServer);
        ....
        splitLogAndExpireIfOnline(currentMetaServer);
        

        because the above code in case of ROOT and META in rit was removing the current active server thinking it as dead in case the ROOT or META is not yet online on RS.

        Show
        ramkrishna.s.vasudevan added a comment - @Devs + // Make sure a -ROOT- location is set. + if (!isRootLocation()) return false ; + // This guarantees that the transition assigning -ROOT- has completed + this .assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); + assigned++; and + // Wait until META region added to region server onlineRegions. See HBASE-5875. + enableSSHandWaitForMeta(); + assigned++; This will ensure that we wait for ROOT and META. Now as HBASE-5918 has gone in, if any RS goes down inbetween root and META assignment SSH will also be triggered. The main intention in this patch is to avoid splitLogAndExpireIfOnline(currentRootServer); .... splitLogAndExpireIfOnline(currentMetaServer); because the above code in case of ROOT and META in rit was removing the current active server thinking it as dead in case the ROOT or META is not yet online on RS.
        Hide
        Ted Yu added a comment -

        @Rajesh:
        Hadoop QA is not functioning.
        Please report back test suite result.

        Show
        Ted Yu added a comment - @Rajesh: Hadoop QA is not functioning. Please report back test suite result.
        Hide
        rajeshbabu added a comment -

        @Ted,
        I will run test suite locally and publish test result.

        Show
        rajeshbabu added a comment - @Ted, I will run test suite locally and publish test result.
        Hide
        Ted Yu added a comment -

        Patch looks good.

        Show
        Ted Yu added a comment - Patch looks good.
        Hide
        ramkrishna.s.vasudevan added a comment -

        I will integrate this tomorrow if there are no objections/comments.

        Show
        ramkrishna.s.vasudevan added a comment - I will integrate this tomorrow if there are no objections/comments.
        Hide
        rajeshbabu added a comment -

        Test suite result :

        Results :
        
        Failed tests:   testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort): The put should have failed, as the coprocessor is buggy
          testDrainingServerOffloading(org.apache.hadoop.hbase.TestDrainingServer): expected:<1> but was:<0>
          testTaskResigned(org.apache.hadoop.hbase.master.TestSplitLogManager): version1=2, version=2
          testNullReturn(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1340328821040.9fe2c292d7f212976859364f8aef27a3. for row 'bbb'
          testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): expected:<0> but was:<3>
          testPermMask(org.apache.hadoop.hbase.util.TestFSUtils): expected:<rwx------> but was:<rwxrwxrwx>
        
        Tests in error: 
          testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/a9504511-b767-40bb-8c4b-4550baa22da2/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/7fcde0d5873845498b313524c3416091
          testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/74d5334b-a9d3-4213-b568-8315e066df68/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/9d8fa21602ce5ba40d1fa704094c8e25
          testOffPeakCompactionRatio(org.apache.hadoop.hbase.regionserver.TestCompactSelection): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/89a77fb2-2048-414c-8f94-6b9a43a51937/TestCompactSelection/logs
          testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): java.io.FileNotFoundException: /mnt/F/hbaseTrunkNew/hbase-server/target/classes/hbase-default.xml (Too many open files)
          testCacheOnWriteInSchema[1](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs
          testCacheOnWriteInSchema[2](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs
          loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 120000 milliseconds
          loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 120000 milliseconds
        
        Tests run: 1577, Failures: 6, Errors: 8, Skipped: 9
        

        ran failed test cases individually these test cases passes.

        Running org.apache.hadoop.hbase.TestDrainingServer
        Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.338 sec
        
        Running org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.07 sec
        
        Running org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
        Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.353 sec
        
        Running org.apache.hadoop.hbase.master.TestSplitLogManager
        Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.243 sec
        

        tests in below test cases are failing but these are not related to this issue. I will check these.

        TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils

        Show
        rajeshbabu added a comment - Test suite result : Results : Failed tests: testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort): The put should have failed, as the coprocessor is buggy testDrainingServerOffloading(org.apache.hadoop.hbase.TestDrainingServer): expected:<1> but was:<0> testTaskResigned(org.apache.hadoop.hbase.master.TestSplitLogManager): version1=2, version=2 testNullReturn(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1340328821040.9fe2c292d7f212976859364f8aef27a3. for row 'bbb' testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): expected:<0> but was:<3> testPermMask(org.apache.hadoop.hbase.util.TestFSUtils): expected:<rwx------> but was:<rwxrwxrwx> Tests in error: testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/a9504511-b767-40bb-8c4b-4550baa22da2/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/7fcde0d5873845498b313524c3416091 testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/74d5334b-a9d3-4213-b568-8315e066df68/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/9d8fa21602ce5ba40d1fa704094c8e25 testOffPeakCompactionRatio(org.apache.hadoop.hbase.regionserver.TestCompactSelection): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/89a77fb2-2048-414c-8f94-6b9a43a51937/TestCompactSelection/logs testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): java.io.FileNotFoundException: /mnt/F/hbaseTrunkNew/hbase-server/target/classes/hbase- default .xml (Too many open files) testCacheOnWriteInSchema[1](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs testCacheOnWriteInSchema[2](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 120000 milliseconds loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 120000 milliseconds Tests run: 1577, Failures: 6, Errors: 8, Skipped: 9 ran failed test cases individually these test cases passes. Running org.apache.hadoop.hbase.TestDrainingServer Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.338 sec Running org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.07 sec Running org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.353 sec Running org.apache.hadoop.hbase.master.TestSplitLogManager Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.243 sec tests in below test cases are failing but these are not related to this issue. I will check these. TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils
        Hide
        Ted Yu added a comment -

        @Rajesh:
        Once Hadoop QA runs through a patch, the attachment itself is marked.
        You need to attach (the same) patch again.

        Show
        Ted Yu added a comment - @Rajesh: Once Hadoop QA runs through a patch, the attachment itself is marked. You need to attach (the same) patch again.
        Hide
        rajeshbabu added a comment -

        @Ted,
        Thanks for information.Upload the same patch and submit for Hadoop QA.

        Show
        rajeshbabu added a comment - @Ted, Thanks for information.Upload the same patch and submit for Hadoop QA.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12533223/HBASE-5875_trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 11 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533223/HBASE-5875_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 11 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2242//console This message is automatically generated.
        Hide
        ramkrishna.s.vasudevan added a comment -

        TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils

        All these testcases are running fine in the latest precommit build.
        Is it ok Ted?

        Show
        ramkrishna.s.vasudevan added a comment - TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils All these testcases are running fine in the latest precommit build. Is it ok Ted?
        Hide
        Ted Yu added a comment -

        I think so.

        Show
        Ted Yu added a comment - I think so.
        Hide
        ramkrishna.s.vasudevan added a comment -

        PAtch for 0.94 ready for commit.

        Show
        ramkrishna.s.vasudevan added a comment - PAtch for 0.94 ready for commit.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Patch for trunk, ready for commit.

        Show
        ramkrishna.s.vasudevan added a comment - Patch for trunk, ready for commit.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Committed to 0.94.1 and 0.96.
        Thanks to Rajesh for the patch.
        Thanks Ted, Chunhui and Stack for the review.

        Show
        ramkrishna.s.vasudevan added a comment - Committed to 0.94.1 and 0.96. Thanks to Rajesh for the patch. Thanks Ted, Chunhui and Stack for the review.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #280 (See https://builds.apache.org/job/HBase-0.94/280/)
        HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server

        Submitted by:Rajesh
        Reviewed by:Ram, Ted, Stack (Revision 1353690)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #280 (See https://builds.apache.org/job/HBase-0.94/280/ ) HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server Submitted by:Rajesh Reviewed by:Ram, Ted, Stack (Revision 1353690) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3070 (See https://builds.apache.org/job/HBase-TRUNK/3070/)
        HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server (Rajesh)

        Submitted by:Rajesh
        Reviewed by:Ram Ted, Stack (Revision 1353688)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3070 (See https://builds.apache.org/job/HBase-TRUNK/3070/ ) HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server (Rajesh) Submitted by:Rajesh Reviewed by:Ram Ted, Stack (Revision 1353688) Result = FAILURE ramkrishna : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #68 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/68/)
        HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server (Rajesh)

        Submitted by:Rajesh
        Reviewed by:Ram Ted, Stack (Revision 1353688)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #68 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/68/ ) HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server (Rajesh) Submitted by:Rajesh Reviewed by:Ram Ted, Stack (Revision 1353688) Result = FAILURE ramkrishna : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #38 (See https://builds.apache.org/job/HBase-0.94-security/38/)
        HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server

        Submitted by:Rajesh
        Reviewed by:Ram, Ted, Stack (Revision 1353690)

        Result = FAILURE
        ramkrishna :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #38 (See https://builds.apache.org/job/HBase-0.94-security/38/ ) HBASE-5875 Process RIT and Master restart may remove an online server considering it as a dead server Submitted by:Rajesh Reviewed by:Ram, Ted, Stack (Revision 1353690) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development