HBase
  1. HBase
  2. HBASE-3243

Disable Table closed region on wrong host

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.90.0
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      I ran some YCSB benchmarks which resulted in about 150 regions worth of data overnight. Then I disabled the table, and the master for some reason closed one region on the wrong server. The server ignored this, but the region remained open on a different server, which later flipped out when it tried to flush due to hlog accumulation.

      1. 3243-v2.patch
        3 kB
        stack
      2. hbase-3243-logs.tar.bz2
        479 kB
        Todd Lipcon
      3. HBASE-3243-v1.patch
        2 kB
        Jonathan Gray
      4. hri.diff
        0.5 kB
        stack

        Activity

        Hide
        Todd Lipcon added a comment -

        Here are some relevant logs... the region in question is user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1

        Here's the birth of the region on haus01:
        2010-11-16 02:28:10,471 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
        2010-11-16 02:28:11,432 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META updated, and report to master. Parent=usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29., new regions: usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.. Split took 8sec

        Master gets split:
        2010-11-16 02:28:11,450 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29.: Daughters; usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. from haus01.sf.cloudera.com,60020,1289890926766

        .. then plenty of successful flushes/compactions on haus01, no mention on any other region server. The benchmark finished around 3:20am, and no mention of this region in any logs from that point until I disabled the table this evening:

        Master says:
        2010-11-16 18:16:17,223 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. (offlining)

        Region server haus02 (NOTE: this is the wrong regionserver!!) says:
        2010-11-16 18:16:17,238 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
        2010-11-16 18:16:17,239 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Received close for region we are not serving; 8a2c525797cad9fd9882cc6304362bd1

        Master says:
        2010-11-16 18:16:17,240 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to send CLOSE to serverName=haus02.sf.cloudera.com,60020,1289890927217, load=(requests=1029, regions=28, usedHeap=4772, maxHeap=8185) for region usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. but failed, setting region as OFFLINE and reassigning
        ...
        2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. state=PENDING_CLOSE, ts=1289960177227
        2010-11-16 18:16:17,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: Table usertable disabling; skipping assign of usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
        2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12c537d84e10000 Deleting existing unassigned node for 8a2c525797cad9fd9882cc6304362bd1 that is in expected state RS_ZK_REGION_CLOSED
        2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12c537d84e10000 Unable to get data of znode /hbase/unassigned/8a2c525797cad9fd9882cc6304362bd1 because node does not exist (not necessarily an error)

        So basically, the master randomly got the wrong regionserver for this region, and there's little indication why.

        Show
        Todd Lipcon added a comment - Here are some relevant logs... the region in question is user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1 Here's the birth of the region on haus01: 2010-11-16 02:28:10,471 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. 2010-11-16 02:28:11,432 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META updated, and report to master. Parent=usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29., new regions: usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.. Split took 8sec Master gets split: 2010-11-16 02:28:11,450 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29.: Daughters; usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. from haus01.sf.cloudera.com,60020,1289890926766 .. then plenty of successful flushes/compactions on haus01, no mention on any other region server. The benchmark finished around 3:20am, and no mention of this region in any logs from that point until I disabled the table this evening: Master says: 2010-11-16 18:16:17,223 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. (offlining) Region server haus02 (NOTE: this is the wrong regionserver!!) says: 2010-11-16 18:16:17,238 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. 2010-11-16 18:16:17,239 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Received close for region we are not serving; 8a2c525797cad9fd9882cc6304362bd1 Master says: 2010-11-16 18:16:17,240 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to send CLOSE to serverName=haus02.sf.cloudera.com,60020,1289890927217, load=(requests=1029, regions=28, usedHeap=4772, maxHeap=8185) for region usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. but failed, setting region as OFFLINE and reassigning ... 2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. state=PENDING_CLOSE, ts=1289960177227 2010-11-16 18:16:17,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: Table usertable disabling; skipping assign of usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. 2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12c537d84e10000 Deleting existing unassigned node for 8a2c525797cad9fd9882cc6304362bd1 that is in expected state RS_ZK_REGION_CLOSED 2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12c537d84e10000 Unable to get data of znode /hbase/unassigned/8a2c525797cad9fd9882cc6304362bd1 because node does not exist (not necessarily an error) So basically, the master randomly got the wrong regionserver for this region, and there's little indication why.
        Hide
        Todd Lipcon added a comment -

        No idea if it's related but it seems like AssignmentManager's regions member is accessed without synchronization sometimes... could end up getting some incorrect data due to this.

        Show
        Todd Lipcon added a comment - No idea if it's related but it seems like AssignmentManager's regions member is accessed without synchronization sometimes... could end up getting some incorrect data due to this.
        Hide
        Todd Lipcon added a comment -

        Just grasping at straws here... but another thought is this:
        The keys of the regions map are HRegionInfo, where compareTo() includes calling down to compareTo() on HTableDescriptor. HTableDescriptor's compareTo eventually delegates to hashcode on the values instance... do we end up changing the values instance when a table gets disabled? Perhaps this is then changing the sort order in the TreeMap and confusing things?

        Show
        Todd Lipcon added a comment - Just grasping at straws here... but another thought is this: The keys of the regions map are HRegionInfo, where compareTo() includes calling down to compareTo() on HTableDescriptor. HTableDescriptor's compareTo eventually delegates to hashcode on the values instance... do we end up changing the values instance when a table gets disabled? Perhaps this is then changing the sort order in the TreeMap and confusing things?
        Hide
        Jonathan Gray added a comment -

        Dug into code around compareTo() and the values instance. Pretty sure we are not touching that while a cluster is running, and definitely not during disabling. We've actually even taken the 'offline' flag out of META as well, it's a node in ZK now that signals the state of a table (enabling/disabling/disabled).

        It is interesting that HRI comparator uses the full HTD comparator. It should probably just use the tableName itself to compare though considering HTD is a member it might make sense in some circumstances to do the full HTD.compareTo()?

        Looking around the code, it does look like there are multiple places we're using regions w/o a lock in the disable table path. Pretty sure this is the cause. Patch soon.

        Show
        Jonathan Gray added a comment - Dug into code around compareTo() and the values instance. Pretty sure we are not touching that while a cluster is running, and definitely not during disabling. We've actually even taken the 'offline' flag out of META as well, it's a node in ZK now that signals the state of a table (enabling/disabling/disabled). It is interesting that HRI comparator uses the full HTD comparator. It should probably just use the tableName itself to compare though considering HTD is a member it might make sense in some circumstances to do the full HTD.compareTo()? Looking around the code, it does look like there are multiple places we're using regions w/o a lock in the disable table path. Pretty sure this is the cause. Patch soon.
        Hide
        Jonathan Gray added a comment -

        Fixes to regions synchronization including one misuse of servers modification under it's own lock rather than the regions lock.

        Looking at this more, I'm not sure synchronization is the issue here because TreeMap appears to only be not thread-safe when there are mutations. The two critical pieces of code where a conflict could happen are where we read the server a region is assigned to, and where we set the server a region is assigned to.

        AssignmentManager:525 in regionOnline()

            synchronized (this.regions) {
              // Add check
              HServerInfo hsi = this.regions.get(regionInfo);
              if (hsi != null) LOG.warn("Overwriting " + regionInfo.getEncodedName() +
                " on " + hsi);
              this.regions.put(regionInfo, serverInfo);
              addToServers(serverInfo, regionInfo);
            }
        

        AssignmentManager:1008 in unassign()

            synchronized (this.regions) {
              server = regions.get(region);
            }
        

        Seems like this is okay as-is. We should definitely still apply this patch to fix the synchronization issues but I'm not sure it is the cause.

        Digging more.

        Show
        Jonathan Gray added a comment - Fixes to regions synchronization including one misuse of servers modification under it's own lock rather than the regions lock. Looking at this more, I'm not sure synchronization is the issue here because TreeMap appears to only be not thread-safe when there are mutations. The two critical pieces of code where a conflict could happen are where we read the server a region is assigned to, and where we set the server a region is assigned to. AssignmentManager:525 in regionOnline() synchronized (this.regions) { // Add check HServerInfo hsi = this.regions.get(regionInfo); if (hsi != null) LOG.warn("Overwriting " + regionInfo.getEncodedName() + " on " + hsi); this.regions.put(regionInfo, serverInfo); addToServers(serverInfo, regionInfo); } AssignmentManager:1008 in unassign() synchronized (this.regions) { server = regions.get(region); } Seems like this is okay as-is. We should definitely still apply this patch to fix the synchronization issues but I'm not sure it is the cause. Digging more.
        Hide
        Jonathan Gray added a comment -

        This is very weird. Can you put up the full logs somewhere?

        Show
        Jonathan Gray added a comment - This is very weird. Can you put up the full logs somewhere?
        Hide
        Todd Lipcon added a comment -

        Looking at this more, I'm not sure synchronization is the issue here because TreeMap appears to only be not thread-safe when there are mutations. The two critical pieces of code where a conflict could happen are where we read the server a region is assigned to, and where we set the server a region is assigned to

        What about removals? I thought I saw a couple places with remove() that were unsynchronized. Will take a look at your patch momentarily.

        This is very weird. Can you put up the full logs somewhere

        Yep, will upload them here, it's just fake data, nothing secret.

        Show
        Todd Lipcon added a comment - Looking at this more, I'm not sure synchronization is the issue here because TreeMap appears to only be not thread-safe when there are mutations. The two critical pieces of code where a conflict could happen are where we read the server a region is assigned to, and where we set the server a region is assigned to What about removals? I thought I saw a couple places with remove() that were unsynchronized. Will take a look at your patch momentarily. This is very weird. Can you put up the full logs somewhere Yep, will upload them here, it's just fake data, nothing secret.
        Hide
        Todd Lipcon added a comment -

        Here are all the logs that contain that region ID.

        Show
        Todd Lipcon added a comment - Here are all the logs that contain that region ID.
        Hide
        stack added a comment -

        This is an odd one. I don't see anything jumping out at me. Need to dig more.

        Show
        stack added a comment - This is an odd one. I don't see anything jumping out at me. Need to dig more.
        Hide
        Jonathan Gray added a comment -

        Been through this several times on the code paths and have found nothing other than what is in my patch.

        There is a weird potential race condition around the master handling of a split (we offline parent and online regions one after the other w/ no locking between operations). But that does not appear to be the issue here.

        I'm yet to really dig through logs though and will do that next.

        Show
        Jonathan Gray added a comment - Been through this several times on the code paths and have found nothing other than what is in my patch. There is a weird potential race condition around the master handling of a split (we offline parent and online regions one after the other w/ no locking between operations). But that does not appear to be the issue here. I'm yet to really dig through logs though and will do that next.
        Hide
        Todd Lipcon added a comment -

        Thanks for looking at this, Jon. I can run some tests with my "crazy config" which generates lots and lots of regions, and see if it happens again. Any good log messages you can suggest adding for this test?

        Show
        Todd Lipcon added a comment - Thanks for looking at this, Jon. I can run some tests with my "crazy config" which generates lots and lots of regions, and see if it happens again. Any good log messages you can suggest adding for this test?
        Hide
        Jonathan Gray added a comment -

        Well try running again with my patch. Or you could even run it again without to see if it happens again and we could get another set of logs. I guess run it with the patch and then if it doesn't ever happen again we can punt the issue or resolve it until we see it again.

        Show
        Jonathan Gray added a comment - Well try running again with my patch. Or you could even run it again without to see if it happens again and we could get another set of logs. I guess run it with the patch and then if it doesn't ever happen again we can punt the issue or resolve it until we see it again.
        Hide
        stack added a comment -

        This little patch makes HRI compare use tablename only rather than full compareTo. Full compareTo is OTT and looks error-prone.

        I've spent some time looking at this issue and I can't figure how it came about. I think we should apply the two patches that are in this issue, down the issues priority, and then cut a new RC. We then test the new RC and see if we can reproduce. If we do, recast this issue as a blocker. Hows that sound?

        Show
        stack added a comment - This little patch makes HRI compare use tablename only rather than full compareTo. Full compareTo is OTT and looks error-prone. I've spent some time looking at this issue and I can't figure how it came about. I think we should apply the two patches that are in this issue, down the issues priority, and then cut a new RC. We then test the new RC and see if we can reproduce. If we do, recast this issue as a blocker. Hows that sound?
        Hide
        Jonathan Gray added a comment -

        +1 to your proposal stack

        I've also gone through this several times and have not been able to come up with anything besides what is contained in these patches.

        Todd, do your best to break the new RC

        Show
        Jonathan Gray added a comment - +1 to your proposal stack I've also gone through this several times and have not been able to come up with anything besides what is contained in these patches. Todd, do your best to break the new RC
        Hide
        stack added a comment -

        Did you intend to do this in your patch Jon?

        @@ -1359,11 +1361,6 @@
             }
             synchronized (this.regions) {
               this.regions.remove(hri);
        -    }
        -    synchronized (this.regionPlans) {
        -      this.regionPlans.remove(hri.getEncodedName());
        -    }
        -    synchronized (this.servers) {
               for (List<HRegionInfo> regions : this.servers.values()) {
                 for (int i=0;i<regions.size();i++) {
                   if (regions.get(i).equals(hri)) {
        
        Show
        stack added a comment - Did you intend to do this in your patch Jon? @@ -1359,11 +1361,6 @@ } synchronized ( this .regions) { this .regions.remove(hri); - } - synchronized ( this .regionPlans) { - this .regionPlans.remove(hri.getEncodedName()); - } - synchronized ( this .servers) { for (List<HRegionInfo> regions : this .servers.values()) { for ( int i=0;i<regions.size();i++) { if (regions.get(i).equals(hri)) {
        Hide
        Jonathan Gray added a comment -

        Yes. That was wrong, we never use this.servers as a lock, we always manipulate this.regions and this.servers under the this.regions lock.

        The removal from regionPlans was just moved underneath it, I didn't take that part out. It's in the next chunk of the diff.

        Show
        Jonathan Gray added a comment - Yes. That was wrong, we never use this.servers as a lock, we always manipulate this.regions and this.servers under the this.regions lock. The removal from regionPlans was just moved underneath it, I didn't take that part out. It's in the next chunk of the diff.
        Hide
        stack added a comment -

        Here's what I applied.

        Show
        stack added a comment - Here's what I applied.
        Hide
        stack added a comment -

        Moving out of 0.90... to 0.90.1 and made it major for now until we see it again.

        Show
        stack added a comment - Moving out of 0.90... to 0.90.1 and made it major for now until we see it again.
        Hide
        Todd Lipcon added a comment -

        Sounds good to me.

        Show
        Todd Lipcon added a comment - Sounds good to me.
        Hide
        stack added a comment -

        Moving out of 0.92.0. Pull it back in if you think different.

        Show
        stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.

          People

          • Assignee:
            Unassigned
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development