HBase
  1. HBase
  2. HBASE-4402

Retaining locality after restart broken

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In DefaultLoadBalancer, we implement the "retain assignment" function like so:

            if (sn != null && servers.contains(sn)) {
              assignments.get(sn).add(region.getKey());
      

      but this will never work since after a cluster restart, all servers have a new ServerName with a new startcode.

      1. 4402-v3.txt
        8 kB
        stack
      2. hbase-4402.txt
        8 kB
        Todd Lipcon
      3. hbase-4402.txt
        7 kB
        Todd Lipcon

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #48 (See https://builds.apache.org/job/HBase-0.92/48/)
        HBASE-4402 Retaining locality after restart broken

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDefaultLoadBalancer.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #48 (See https://builds.apache.org/job/HBase-0.92/48/ ) HBASE-4402 Retaining locality after restart broken stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDefaultLoadBalancer.java
        Hide
        stack added a comment -

        Applied to 0.92 branch and trunk. Thanks Todd.

        Show
        stack added a comment - Applied to 0.92 branch and trunk. Thanks Todd.
        Hide
        stack added a comment -

        Here is what I committed (Todd's work plus the little amendment).

        Show
        stack added a comment - Here is what I committed (Todd's work plus the little amendment).
        Hide
        stack added a comment -

        Let me apply the updated patch. The below failures seem unrelated:

        
        Failed tests:   testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
          testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
          testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
        

        I saw them in a clean 0.92 run. I'm working on fixing these elsewhere.

        Show
        stack added a comment - Let me apply the updated patch. The below failures seem unrelated: Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> I saw them in a clean 0.92 run. I'm working on fixing these elsewhere.
        Hide
        Lars Hofhansl added a comment -

        We just ran into this yesterday in our 0.92 based (small) test cluster.
        Nice patch!

        Show
        Lars Hofhansl added a comment - We just ran into this yesterday in our 0.92 based (small) test cluster. Nice patch!
        Hide
        Todd Lipcon added a comment -

        Your fix looks good. I didn't think about the fact that there might be nulls due to regions that were added while the server was offline, or if a table was disabled when shutdown, but ZK was wiped before the cluster was brought back up.

        Show
        Todd Lipcon added a comment - Your fix looks good. I didn't think about the fact that there might be nulls due to regions that were added while the server was offline, or if a table was disabled when shutdown, but ZK was wiped before the cluster was brought back up.
        Hide
        stack added a comment -

        Looks like the testMergeTable failure is because of this patch. Adding the below gets us past the testMergeTable failure (NPEs causing master shutdown and then test would hang looking for the absconded master). This test is weird too anyways hand-creating regions in fs then bringing all online; needs to be refactored but meantime....

        68,69c140,143
        < +      List<ServerName> localServers = serversByHostname.get(
        < +          oldServerName.getHostname());
        ---
        > +      List<ServerName> localServers = new ArrayList<ServerName>();
        > +      if (oldServerName != null) {
        > +        localServers = serversByHostname.get(oldServerName.getHostname());
        > +      }
        76c150
        < +        oldHostsNoLongerPresent.add(oldServerName.getHostname());
        ---
        > +        if (oldServerName != null) oldHostsNoLongerPresent.add(oldServerName.getHostname());
        
        Show
        stack added a comment - Looks like the testMergeTable failure is because of this patch. Adding the below gets us past the testMergeTable failure (NPEs causing master shutdown and then test would hang looking for the absconded master). This test is weird too anyways hand-creating regions in fs then bringing all online; needs to be refactored but meantime.... 68,69c140,143 < + List<ServerName> localServers = serversByHostname.get( < + oldServerName.getHostname()); --- > + List<ServerName> localServers = new ArrayList<ServerName>(); > + if (oldServerName != null ) { > + localServers = serversByHostname.get(oldServerName.getHostname()); > + } 76c150 < + oldHostsNoLongerPresent.add(oldServerName.getHostname()); --- > + if (oldServerName != null ) oldHostsNoLongerPresent.add(oldServerName.getHostname());
        Hide
        stack added a comment -

        I ran it again and got this:

        
        Results :
        
        Failed tests:   testSingleMethod(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,ccc,1317795815964.d181ee3cf99a2f9e33d6b4561f52a72f. for row 'ccc'
          testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
          testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
        
        Tests in error:
          testMergeTable(org.apache.hadoop.hbase.util.TestMergeTable): test timed out after 300000 milliseconds
        
        Tests run: 1017, Failures: 3, Errors: 1, Skipped: 21
        

        Similar machine did clean build of 0.92. Will keep at it.

        Show
        stack added a comment - I ran it again and got this: Results : Failed tests: testSingleMethod(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,ccc,1317795815964.d181ee3cf99a2f9e33d6b4561f52a72f. for row 'ccc' testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> Tests in error: testMergeTable(org.apache.hadoop.hbase.util.TestMergeTable): test timed out after 300000 milliseconds Tests run: 1017, Failures: 3, Errors: 1, Skipped: 21 Similar machine did clean build of 0.92. Will keep at it.
        Hide
        stack added a comment -

        Ran test suite w/o on 0.92 and all tests passed and with it these failed:

        Failed tests:   testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
          testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
          testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
        
        Tests in error:
          testMergeTable(org.apache.hadoop.hbase.util.TestMergeTable): test timed out after 300000 milliseconds
        

        I'm taking a look.

        Show
        stack added a comment - Ran test suite w/o on 0.92 and all tests passed and with it these failed: Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> Tests in error: testMergeTable(org.apache.hadoop.hbase.util.TestMergeTable): test timed out after 300000 milliseconds I'm taking a look.
        Hide
        Ted Yu added a comment -

        Sure.

        Show
        Ted Yu added a comment - Sure.
        Hide
        stack added a comment -

        OK if I commit this? Ted?

        Show
        stack added a comment - OK if I commit this? Ted?
        Hide
        Todd Lipcon added a comment -

        Since we now have locality metrics on regions, it seems like we can already see that information elsewhere. The log output here is helpful since you can quickly see which servers didn't properly rejoin the cluster.

        Show
        Todd Lipcon added a comment - Since we now have locality metrics on regions, it seems like we can already see that information elsewhere. The log output here is helpful since you can quickly see which servers didn't properly rejoin the cluster.
        Hide
        Ted Yu added a comment -

        Thanks Todd for the quick response, appreciate it.

        If I were Ops, I would be more interested in knowing the regions that were randomly assigned.

        Show
        Ted Yu added a comment - Thanks Todd for the quick response, appreciate it. If I were Ops, I would be more interested in knowing the regions that were randomly assigned.
        Hide
        Todd Lipcon added a comment -

        New patch addresses Ted's comments. The log messages are as follows:

        For the case where all are retained:

        2011-09-15 11:22:57,592 INFO  [main] master.DefaultLoadBalancer(643): Reassigned 100 regions. 100 retained the pre-restart assignment.
        

        For the case where some were reassigned:

         
        2011-09-15 11:22:57,604 INFO  [main] master.DefaultLoadBalancer(643): Reassigned 100 regions. 80 retained the pre-restart assignment. 20 regions were assigned to random hosts, since the old hosts for these regions are no longer present in the cluster. These hosts were:
          server29568
          server80916
        
        Show
        Todd Lipcon added a comment - New patch addresses Ted's comments. The log messages are as follows: For the case where all are retained: 2011-09-15 11:22:57,592 INFO [main] master.DefaultLoadBalancer(643): Reassigned 100 regions. 100 retained the pre-restart assignment. For the case where some were reassigned: 2011-09-15 11:22:57,604 INFO [main] master.DefaultLoadBalancer(643): Reassigned 100 regions. 80 retained the pre-restart assignment. 20 regions were assigned to random hosts, since the old hosts for these regions are no longer present in the cluster. These hosts were: server29568 server80916
        Hide
        stack added a comment -

        +1 on patch. I like the new log message on how much assignment retained old locations.

        Show
        stack added a comment - +1 on patch. I like the new log message on how much assignment retained old locations.
        Hide
        Ted Yu added a comment -
        +        ServerName randomServer = servers.get(RANDOM.nextInt(servers.size()));
        +        assignments.get(randomServer).add(region);
        +        numRandomAssignments++;
        

        I think we should log the (region, server) pairs for the above case.

        Show
        Ted Yu added a comment - + ServerName randomServer = servers.get(RANDOM.nextInt(servers.size())); + assignments.get(randomServer).add(region); + numRandomAssignments++; I think we should log the (region, server) pairs for the above case.
        Hide
        Todd Lipcon added a comment -

        Turns out the test basically duplicated the same faulty logic. Fixed both the test and the code. Haven't tried on a cluster yet.

        Show
        Todd Lipcon added a comment - Turns out the test basically duplicated the same faulty logic. Fixed both the test and the code. Haven't tried on a cluster yet.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development