HBase
  1. HBase
  2. HBASE-4064

Two concurrent unassigning of the same region caused the endless loop of "Region has been PENDING_CLOSE for too long..."

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.90.3
    • Fix Version/s: 0.90.7
    • Component/s: master
    • Labels:
      None

      Description

      1. If there is a "rubbish" RegionState object with "PENDING_CLOSE" in regionsInTransition(The RegionState was remained by some exception which should be removed, that's why I called it as "rubbish" object), but the region is not currently assigned anywhere, TimeoutMonitor will fall into an endless loop:

      2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301
      2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:32:21,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:32:21,441 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere
      2011-06-27 10:32:31,207 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301
      2011-06-27 10:32:31,207 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:32:31,215 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:32:31,215 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere
      2011-06-27 10:32:41,164 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301
      2011-06-27 10:32:41,164 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:32:41,172 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:32:41,172 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere
      .....

      2 In the following scenario, two concurrent unassigning call of the same region may lead to the above problem:
      the first unassign call send rpc call success, the master watched the event of "RS_ZK_REGION_CLOSED", process this event, will create a ClosedRegionHandler to remove the state of the region in master.eg.
      while ClosedRegionHandler is running in "hbase.master.executor.closeregion.threads" thread (A), another unassign call of same region run in another thread(B).
      while thread B run "if (!regions.containsKey(region))", this.regions have the region info, now cpu switch to thread A.
      The thread A will remove the region from the sets of "this.regions" and "regionsInTransition", then switch to thread B. the thread B run continue, will throw an exception with the msg of "Server null returned java.lang.NullPointerException: Passed server is null for 9a6e26d40293663a79523c58315b930f", but without removing the new-adding RegionState from "regionsInTransition",and it can not be removed for ever.

      public void unassign(HRegionInfo region, boolean force) {
      LOG.debug("Starting unassignment of region " +
      region.getRegionNameAsString() + " (offlining)");
      synchronized (this.regions) {
      // Check if this region is currently assigned
      if (!regions.containsKey(region))

      { LOG.debug("Attempted to unassign region " + region.getRegionNameAsString() + " but it is not " + "currently assigned anywhere"); return; }

      }
      String encodedName = region.getEncodedName();
      // Grab the state of this region and synchronize on it
      RegionState state;
      long stamp = -1;
      synchronized (regionsInTransition) {
      state = regionsInTransition.get(encodedName);
      if (state == null)

      { state = new RegionState(region, RegionState.State.PENDING_CLOSE); stamp =state.getStamp(); regionsInTransition.put(encodedName, state); }

      else if (force && state.isPendingClose())

      { LOG.debug("Attempting to unassign region " + region.getRegionNameAsString() + " which is already pending close " + "but forcing an additional close"); state.update(RegionState.State.PENDING_CLOSE); }

      else

      { LOG.debug("Attempting to unassign region " + region.getRegionNameAsString() + " but it is " + "already in transition (" + state.getState() + ")"); return; }

      }

      // Send CLOSE RPC
      HServerInfo server = null;
      synchronized (this.regions)

      { server = regions.get(region); }

      }

      2011-06-27 10:20:59,583 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:20:59,585 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=158-1-91-101,20020,1308983865292, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:22:15,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/9a6e26d40293663a79523c58315b930f (region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f., server=158-1-91-101,20020,1308983865292, state=RS_ZK_REGION_CLOSED)
      2011-06-27 10:22:15,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=158-1-91-101,20020,1308983865292, region=9a6e26d40293663a79523c58315b930f
      2011-06-27 10:25:22,636 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=CLOSED, ts=1309141335247
      2011-06-27 10:25:22,636 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 9a6e26d40293663a79523c58315b930f has been CLOSED for too long, waiting on queued ClosedRegionHandler to run or server shutdown
      2011-06-27 10:25:55,022 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 9a6e26d40293663a79523c58315b930f
      2011-06-27 10:25:55,022 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:25:55,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x230ba8b85230036 Deleting existing unassigned node for 9a6e26d40293663a79523c58315b930f that is in expected state RS_ZK_REGION_CLOSED
      2011-06-27 10:25:55,101 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x230ba8b85230036 Successfully deleted unassigned node for region 9a6e26d40293663a79523c58315b930f in expected state RS_ZK_REGION_CLOSED
      2011-06-27 10:25:55,301 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:25:55,302 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 9a6e26d40293663a79523c58315b930f
      2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301
      2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
      2011-06-27 10:32:21,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining)
      2011-06-27 10:32:21,441 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere

      3 The following scenario shows how the above problem 2 happened:

      (1)A table have a lot of regions, more than 70000 in my test.
      (2)Disable the table, if 'BulkDisabler.waitUntilDone' timeout, 'DisableTableHandler.process' will create another BulkDisabler object and start its thread pool. The region which was still online will call AssignmentManager.unassign again. so the same region "AssignmentManager.unassign" could be called concurrentlly more than 1.

      1. HBASE-4064-v1.patch
        3 kB
        Jieshan Bean
      2. HBASE-4064_branch90V2.patch
        1 kB
        gaojinchao
      3. disableflow.png
        29 kB
        gaojinchao

        Activity

        Hide
        Jonathan Hsieh added a comment -

        closing, ancient.

        Show
        Jonathan Hsieh added a comment - closing, ancient.
        Hide
        Jonathan Hsieh added a comment -

        There was some recent activity here, anyone planning on finishing this guy? (its been bumped a few times considering bumping it for the 0.90.7 release).

        Show
        Jonathan Hsieh added a comment - There was some recent activity here, anyone planning on finishing this guy? (its been bumped a few times considering bumping it for the 0.90.7 release).
        Hide
        steven zhuang added a comment -

        hi, Ramkrishna, sorry for the late response.

        This turn of event came because we wanted to update the hbase jar to enable our table-balancer without turning the whole cluster off.

        Here is what I did:
        1. turn off first half RS of the cluster, update the jars
        2. turn on the first half RS, (used "hbase-daemon.sh stop " on the RS which holds the META region coz it won't go down )
        3. turn off the second half RS of the cluster, update the jars,
        4. turn on the second half RS
        At this time, I found the ROOT region is in transition, and status was CLOSING although the RS hold it was down . So I tried to restart the whole cluster, the cluster cannot be started now, coz the RS which master believes holds the ROOT region has no region served, master can only give out "NotServingRegionException". I closed all the RS, the started the master again, only to find the master trying to reach the same "ROOT-holding" RS which was down then, and gave "ServerNotRunningYetException" after dozens of retries.

        I guess I restarted the "half cluster" too soon after I closed it, so root region was still in transition when the original RS restarted, and later I force closed the whole cluster, but status in ZK cluster stayed. so later when I restarted the cluster, the master tried to close the ROOT region instead of start it, but that time, the region is not on any RS, master then fall into a (dead?) loop.

        the log files is all messed up, with splitlog and other crappy exception log, and is way too big(even just the grepped part) to upload, so hope my description helps.

        Show
        steven zhuang added a comment - hi, Ramkrishna, sorry for the late response. This turn of event came because we wanted to update the hbase jar to enable our table-balancer without turning the whole cluster off. Here is what I did: 1. turn off first half RS of the cluster, update the jars 2. turn on the first half RS, (used "hbase-daemon.sh stop " on the RS which holds the META region coz it won't go down ) 3. turn off the second half RS of the cluster, update the jars, 4. turn on the second half RS At this time, I found the ROOT region is in transition, and status was CLOSING although the RS hold it was down . So I tried to restart the whole cluster, the cluster cannot be started now, coz the RS which master believes holds the ROOT region has no region served, master can only give out "NotServingRegionException". I closed all the RS, the started the master again, only to find the master trying to reach the same "ROOT-holding" RS which was down then, and gave "ServerNotRunningYetException" after dozens of retries. I guess I restarted the "half cluster" too soon after I closed it, so root region was still in transition when the original RS restarted, and later I force closed the whole cluster, but status in ZK cluster stayed. so later when I restarted the cluster, the master tried to close the ROOT region instead of start it, but that time, the region is not on any RS, master then fall into a (dead?) loop. the log files is all messed up, with splitlog and other crappy exception log, and is way too big(even just the grepped part) to upload, so hope my description helps.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Steven
        Can you provide us some logs to see the problem. Because its important we solve such issues. May be it exists in other versions also?
        Thanks Steve.

        Show
        ramkrishna.s.vasudevan added a comment - @Steven Can you provide us some logs to see the problem. Because its important we solve such issues. May be it exists in other versions also? Thanks Steve.
        Hide
        steven zhuang added a comment -

        hi, Ramkrishna, thanks for you reply, I am now using HBase0.92, and no, I cannot upgrade the cluster at present.
        actually my problem is a slightly different from this. I sort of closed the ROOT region under certain condition, later the cluster cannot be restarted, saying unsigning ROOT region because it's not assigned anywhere.
        We fixed it by clear the root region node from the zookeeper's unsigned nodes list.
        thanks anyway.

        Show
        steven zhuang added a comment - hi, Ramkrishna, thanks for you reply, I am now using HBase0.92, and no, I cannot upgrade the cluster at present. actually my problem is a slightly different from this. I sort of closed the ROOT region under certain condition, later the cluster cannot be restarted, saying unsigning ROOT region because it's not assigned anywhere. We fixed it by clear the root region node from the zookeeper's unsigned nodes list. thanks anyway.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Steven
        Which version of Hbase you are using?
        In the latest versions HBase-0.94 and above we are trying to solve these type of issues. Can you try with those versions?
        Thanks.

        Show
        ramkrishna.s.vasudevan added a comment - @Steven Which version of Hbase you are using? In the latest versions HBase-0.94 and above we are trying to solve these type of issues. Can you try with those versions? Thanks.
        Hide
        steven zhuang added a comment -

        any one still work around this issue? Pls tell me how to fix this. Thanks!

        Show
        steven zhuang added a comment - any one still work around this issue? Pls tell me how to fix this. Thanks!
        Hide
        gaojinchao added a comment -

        I found enable table issue was fixed in HBASE-4395.
        I think this issue should use the same way to fix.

        Show
        gaojinchao added a comment - I found enable table issue was fixed in HBASE-4395 . I think this issue should use the same way to fix.
        Hide
        gaojinchao added a comment -

        I will study the code for trunk and confirm that have fixed.

        Show
        gaojinchao added a comment - I will study the code for trunk and confirm that have fixed.
        Hide
        ramkrishna.s.vasudevan added a comment -

        As JD said timeout monitor has to be refactored, but additionally before changing the state to CLOSING again we could have checked what is the current state of the node.
        Correct me if am wrong.

        Show
        ramkrishna.s.vasudevan added a comment - As JD said timeout monitor has to be refactored, but additionally before changing the state to CLOSING again we could have checked what is the current state of the node. Correct me if am wrong.
        Hide
        gaojinchao added a comment -

        Do we need fix this issue? If it need I will test it. or I will close the issue ?

        Show
        gaojinchao added a comment - Do we need fix this issue? If it need I will test it. or I will close the issue ?
        Hide
        gaojinchao added a comment -

        Master may be crashed because of pool shutdown is asynchronous.

        The master show :
        2011-07-22 13:33:27,806 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 2156 are online.

        2011-07-22 13:34:28,646 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 982 are online.
        2011-07-22 13:34:31,079 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229.
        2011-07-22 13:34:31,080 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x31502ef4f00000 Creating (or updating) unassigned node for c9b1c97ac6c00033ceb1890e45e66229 with OFFLINE state
        2011-07-22 13:34:31,104 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=OFFLINE, ts=1311312871080
        2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. so generated a random one; hri=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229., src=, dest=C4C2.site,60020,1311310281335; 3 (online=3, exclude=null) available servers
        2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. to C4C2.site,60020,1311310281335
        2011-07-22 13:34:31,122 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229.
        2011-07-22 13:34:31,123 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=PENDING_OPEN, ts=1311312871121
        java.lang.IllegalStateException
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1081)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1036)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:864)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:844)
        at java.lang.Thread.run(Thread.java:662)
        2011-07-22 13:34:31,125 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

        Show
        gaojinchao added a comment - Master may be crashed because of pool shutdown is asynchronous. The master show : 2011-07-22 13:33:27,806 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 2156 are online. 2011-07-22 13:34:28,646 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 982 are online. 2011-07-22 13:34:31,079 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. 2011-07-22 13:34:31,080 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x31502ef4f00000 Creating (or updating) unassigned node for c9b1c97ac6c00033ceb1890e45e66229 with OFFLINE state 2011-07-22 13:34:31,104 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=OFFLINE, ts=1311312871080 2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. so generated a random one; hri=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229., src=, dest=C4C2.site,60020,1311310281335; 3 (online=3, exclude=null) available servers 2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. to C4C2.site,60020,1311310281335 2011-07-22 13:34:31,122 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. 2011-07-22 13:34:31,123 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=PENDING_OPEN, ts=1311312871121 java.lang.IllegalStateException at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1081) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1036) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:864) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:844) at java.lang.Thread.run(Thread.java:662) 2011-07-22 13:34:31,125 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
        Hide
        Ted Yu added a comment -

        How did you come up with the duration below ?

        +          Thread.sleep(5 * 1000);
        

        The above code is inside a while loop. Why is it needed ?

        Show
        Ted Yu added a comment - How did you come up with the duration below ? + Thread .sleep(5 * 1000); The above code is inside a while loop. Why is it needed ?
        Hide
        gaojinchao added a comment -

        I have made a patch, but I don't verify now. I want to review whether is reasonable firstly. then do it.

        In my cluster I had changed the parameter(hbase.bulk.assignment.waiton.empty.rit) to avoid this issue.

        Show
        gaojinchao added a comment - I have made a patch, but I don't verify now. I want to review whether is reasonable firstly. then do it. In my cluster I had changed the parameter(hbase.bulk.assignment.waiton.empty.rit) to avoid this issue.
        Hide
        gaojinchao added a comment -

        The patch can't solve J-D issue. But it is improvement for disable table.

        I make a flow chart(A >B ->C>D). We can find there is a window between Remove region from RIT and Remove region from region clellections. So my patch want to change the positon.

        Show
        gaojinchao added a comment - The patch can't solve J-D issue. But it is improvement for disable table. I make a flow chart(A >B ->C >D). We can find there is a window between Remove region from RIT and Remove region from region clellections. So my patch want to change the positon.
        Hide
        gaojinchao added a comment -

        Show
        gaojinchao added a comment -
        Hide
        stack added a comment -

        4.When master receives the watcher event, It removes the region from RIT and then remove from regions collection. There is a short window when diable table can't finish in a period(). The region may be unssigned again. My patch try to fix above case. remove regions collection firstly and disable thread can't get a processing region.

        What is 'period()' in the above Gao? Where is it? Where is the code that does the 'unassign again'? I'm trying to understand why in particular your patch addresses. Do you think your patch solves the problem in spite of the race windows described above by J-D?

        Show
        stack added a comment - 4.When master receives the watcher event, It removes the region from RIT and then remove from regions collection. There is a short window when diable table can't finish in a period(). The region may be unssigned again. My patch try to fix above case. remove regions collection firstly and disable thread can't get a processing region. What is 'period()' in the above Gao? Where is it? Where is the code that does the 'unassign again'? I'm trying to understand why in particular your patch addresses. Do you think your patch solves the problem in spite of the race windows described above by J-D?
        Hide
        gaojinchao added a comment -

        @J-D Thanks for your replay.

        I got it. In my case, The race is between the disable threads and ClosedRegionHandler threads.

        1.Disable thread get region from regions collection (reference getRegionsOfTable)

        2.Thread pool gets region and sends request to region server. at the same time puts region into RIT(regionsInTransition), it indicates that the region is processing.

        3.Region server finishs closing region and changes the zk state, notifies the master.

        4.When master receives the watcher event, It removes the region from RIT and then remove from regions collection.
        There is a short window when diable table can't finish in a period(). The region may be unssigned again.

        My patch try to fix above case. remove regions collection firstly and disable thread can't get a processing region.

        I found the issue yestertay, Enable threads is also a race condition.
        (I changed the period for 1 minutes because of reproducing the issue). It seems pool couldn't finish but a new enable process starts. we need a sleep time when a enable period finishes

        The master logs:
        2011-07-22 13:33:27,806 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 2156 are online.

        2011-07-22 13:34:28,646 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 982 are online.
        2011-07-22 13:34:31,079 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229.
        2011-07-22 13:34:31,080 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x31502ef4f00000 Creating (or updating) unassigned node for c9b1c97ac6c00033ceb1890e45e66229 with OFFLINE state
        2011-07-22 13:34:31,104 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=OFFLINE, ts=1311312871080
        2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. so generated a random one; hri=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229., src=, dest=C4C2.site,60020,1311310281335; 3 (online=3, exclude=null) available servers
        2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. to C4C2.site,60020,1311310281335
        2011-07-22 13:34:31,122 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229.
        2011-07-22 13:34:31,123 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=PENDING_OPEN, ts=1311312871121
        java.lang.IllegalStateException
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1081)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1036)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:864)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:844)
        at org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
        2011-07-22 13:34:31,125 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
        2011-07-22 13:34:31,482 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x31502ef4f00000 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/c9b1c97ac6c00033ceb1890e45e66229
        2011-07-22 13:34:31,482 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x31502ef4f00000 Unable to get data of znode /hbase/unassigned/c9b1c97ac6c00033ceb1890e45e66229

        Show
        gaojinchao added a comment - @J-D Thanks for your replay. I got it. In my case, The race is between the disable threads and ClosedRegionHandler threads. 1.Disable thread get region from regions collection (reference getRegionsOfTable) 2.Thread pool gets region and sends request to region server. at the same time puts region into RIT(regionsInTransition), it indicates that the region is processing. 3.Region server finishs closing region and changes the zk state, notifies the master. 4.When master receives the watcher event, It removes the region from RIT and then remove from regions collection. There is a short window when diable table can't finish in a period(). The region may be unssigned again. My patch try to fix above case. remove regions collection firstly and disable thread can't get a processing region. I found the issue yestertay, Enable threads is also a race condition. (I changed the period for 1 minutes because of reproducing the issue). It seems pool couldn't finish but a new enable process starts. we need a sleep time when a enable period finishes The master logs: 2011-07-22 13:33:27,806 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 2156 are online. 2011-07-22 13:34:28,646 INFO org.apache.hadoop.hbase.master.handler.EnableTableHandler: Table has 2156 regions of which 982 are online. 2011-07-22 13:34:31,079 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. 2011-07-22 13:34:31,080 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x31502ef4f00000 Creating (or updating) unassigned node for c9b1c97ac6c00033ceb1890e45e66229 with OFFLINE state 2011-07-22 13:34:31,104 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=OFFLINE, ts=1311312871080 2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. so generated a random one; hri=ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229., src=, dest=C4C2.site,60020,1311310281335; 3 (online=3, exclude=null) available servers 2011-07-22 13:34:31,121 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. to C4C2.site,60020,1311310281335 2011-07-22 13:34:31,122 WARN org.apache.hadoop.hbase.master.AssignmentManager: gjc:xxx ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. 2011-07-22 13:34:31,123 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; ufdr5,0590386138,1311057525896.c9b1c97ac6c00033ceb1890e45e66229. state=PENDING_OPEN, ts=1311312871121 java.lang.IllegalStateException at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1081) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1036) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:864) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:844) at org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-07-22 13:34:31,125 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2011-07-22 13:34:31,482 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x31502ef4f00000 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/c9b1c97ac6c00033ceb1890e45e66229 2011-07-22 13:34:31,482 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x31502ef4f00000 Unable to get data of znode /hbase/unassigned/c9b1c97ac6c00033ceb1890e45e66229
        Hide
        stack added a comment -

        Yah! J-D is back!

        Show
        stack added a comment - Yah! J-D is back!
        Hide
        Jean-Daniel Cryans added a comment -

        This work should really be done for 0.92, not 0.90, because in its current state the master is bound to have race conditions.

        As I have commented in other jiras, the TimeoutMonitor acts out of band with regards to the other threads. While most of the code in the master assumes that every operation is atomic on a single region, the TM doesn't. It can figure out that a RIT expired and then only process it minutes later because of how that code was written. Here's how it works:

        1. iterate on all RIT
          1. if the timed out
            1. Put the region in the assign/unassign list (depends on last state)
            2. also depending on state, do some ZK operation (which can take a few tens of ms)
        2. Process all the regions to unassign one by one
        3. Process all the regions to assign one by one

        Here the issue is that at any time a region can change state while the TM runs. Let's say the TM saw region x timing out on PENDING_CLOSE, but then the millisecond later the master receives the notification that the region was finally closed. What happens is that the region will be forced unassigned in the loop at the end (while the region might already be reopened elsewhere). All of this is made even worse by the fact that ZK calls, in aggregate, really take a lot of time. Plus during the assign and unassign loops you need to talk to region servers.

        The window for races is so large basically anything can happen.

        One reason those operations are slow is that the master is suffering for a lot of lock contention, for reference see HBASE-3789. This is only in 0.92, and that's one more reason to focus our work there.

        Something we did late in 0.90 to make it less prone to races is to bump the timeout itself, as I was mentioning in HBASE-3669.

        If anything else should be done in 0.90, it would be to disable the TimeoutMonitor as it does more harm than good.

        In 0.92 it should be completely redone to work atomically on regions, instead of doing its own thing.

        Show
        Jean-Daniel Cryans added a comment - This work should really be done for 0.92, not 0.90, because in its current state the master is bound to have race conditions. As I have commented in other jiras, the TimeoutMonitor acts out of band with regards to the other threads. While most of the code in the master assumes that every operation is atomic on a single region, the TM doesn't. It can figure out that a RIT expired and then only process it minutes later because of how that code was written. Here's how it works: iterate on all RIT if the timed out Put the region in the assign/unassign list (depends on last state) also depending on state, do some ZK operation (which can take a few tens of ms) Process all the regions to unassign one by one Process all the regions to assign one by one Here the issue is that at any time a region can change state while the TM runs. Let's say the TM saw region x timing out on PENDING_CLOSE, but then the millisecond later the master receives the notification that the region was finally closed. What happens is that the region will be forced unassigned in the loop at the end (while the region might already be reopened elsewhere). All of this is made even worse by the fact that ZK calls, in aggregate, really take a lot of time. Plus during the assign and unassign loops you need to talk to region servers. The window for races is so large basically anything can happen. One reason those operations are slow is that the master is suffering for a lot of lock contention, for reference see HBASE-3789 . This is only in 0.92, and that's one more reason to focus our work there. Something we did late in 0.90 to make it less prone to races is to bump the timeout itself, as I was mentioning in HBASE-3669 . If anything else should be done in 0.90, it would be to disable the TimeoutMonitor as it does more harm than good. In 0.92 it should be completely redone to work atomically on regions, instead of doing its own thing.
        Hide
        gaojinchao added a comment -

        Please don't merge the patch, I found other issue and need dig whether is relation to the patch. Thanks.

        Show
        gaojinchao added a comment - Please don't merge the patch, I found other issue and need dig whether is relation to the patch. Thanks.
        Hide
        gaojinchao added a comment -

        I reproduced the issue by the above code. and modifed for below code, the issue can't reproduce.(about 2000 region)
        So I think V2 is good..
        public void regionOffline(final HRegionInfo regionInfo) {

        // remove the region plan as well just in case.
        clearRegionPlan(regionInfo);
        setOffline(regionInfo);

        try

        { Thread.sleep(1000); }

        catch(Throwable e)

        { ; }

        synchronized(this.regionsInTransition) {
        if (this.regionsInTransition.remove(regionInfo.getEncodedName()) != null)

        { this.regionsInTransition.notifyAll(); }

        }

        }
        I don't collect the logs. So I produce it again. the logs:

        //disable table start.
        hbase(main):002:0> disable 'ufdr5'
        11/07/21 15:07:25 DEBUG zookeeper.ZKUtil: hconnection-0x314b823b0a0005 Set watcher on existing znode /hbase/root-region-server

        hmaster logs:
        2011-07-21 15:10:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=CLOSED, ts=1311232060203
        2011-07-21 15:10:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 32cb070f97e3b7b0d261f427a0fc110a has been CLOSED for too long, waiting on queued ClosedRegionHandler to run or server shutdown
        2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 32cb070f97e3b7b0d261f427a0fc110a
        2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a.
        2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x314b823b0a0000 Deleting existing unassigned node for 32cb070f97e3b7b0d261f427a0fc110a that is in expected state RS_ZK_REGION_CLOSED
        2011-07-21 15:12:25,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Retrieved 107 byte(s) of data from znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a; data=region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a., server=C4C2.site,60020,1311231851946, state=RS_ZK_REGION_CLOSED
        2011-07-21 15:12:25,688 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x314b823b0a0000 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a
        2011-07-21 15:12:25,688 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x314b823b0a0000 Successfully deleted unassigned node for region 32cb070f97e3b7b0d261f427a0fc110a in expected state RS_ZK_REGION_CLOSED
        2011-07-21 15:12:26,383 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining)
        2011-07-21 15:12:26,391 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=C4C2.site,60020,1311231851946, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. but we are not serving it for 32cb070f97e3b7b0d261f427a0fc110a
        2011-07-21 15:15:32,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=PENDING_CLOSE, ts=1311232346385
        2011-07-21 15:15:32,301 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a.
        2011-07-21 15:15:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Set watcher on existing znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a
        2011-07-21 15:15:32,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining)
        2011-07-21 15:15:32,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. but it is not currently assigned anywhere
        2011-07-21 15:15:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=PENDING_CLOSE, ts=1311232346385
        2011-07-21 15:15:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a.
        2011-07-21 15:15:42,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Set watcher on existing znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a
        2011-07-21 15:15:42,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining)

        The log after modified code

        //I do this several times
        hbase(main):004:0> disable 'ufdr5'
        11/07/21 15:42:52 DEBUG zookeeper.ZKUtil: hconnection-0x314b967b920003 Set watcher on existing znode /hbase/root-region-server

        2011-07-21 15:50:21,443 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x1314b9678830000 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned
        2011-07-21 15:50:21,571 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done=true
        2011-07-21 15:51:26,908 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: C4C3.site,60020,1311233178997=0, C4C2.site,60020,1311233178981=1, C4C1.site,60020,1311233179039=1
        2011-07-21 15:51:26,908 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing. servers=3 regions=2 average=0.6666667 mostloaded=1 leastloaded=0
        2011-07-21 15:51:27,402 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 2156 catalog row(s) and gc'd 0 unreferenced parent region(s)

        Show
        gaojinchao added a comment - I reproduced the issue by the above code. and modifed for below code, the issue can't reproduce.(about 2000 region) So I think V2 is good.. public void regionOffline(final HRegionInfo regionInfo) { // remove the region plan as well just in case. clearRegionPlan(regionInfo); setOffline(regionInfo); try { Thread.sleep(1000); } catch(Throwable e) { ; } synchronized(this.regionsInTransition) { if (this.regionsInTransition.remove(regionInfo.getEncodedName()) != null) { this.regionsInTransition.notifyAll(); } } } I don't collect the logs. So I produce it again. the logs: //disable table start. hbase(main):002:0> disable 'ufdr5' 11/07/21 15:07:25 DEBUG zookeeper.ZKUtil: hconnection-0x314b823b0a0005 Set watcher on existing znode /hbase/root-region-server hmaster logs: 2011-07-21 15:10:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=CLOSED, ts=1311232060203 2011-07-21 15:10:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 32cb070f97e3b7b0d261f427a0fc110a has been CLOSED for too long, waiting on queued ClosedRegionHandler to run or server shutdown 2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 32cb070f97e3b7b0d261f427a0fc110a 2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. 2011-07-21 15:12:25,672 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x314b823b0a0000 Deleting existing unassigned node for 32cb070f97e3b7b0d261f427a0fc110a that is in expected state RS_ZK_REGION_CLOSED 2011-07-21 15:12:25,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Retrieved 107 byte(s) of data from znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a; data=region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a., server=C4C2.site,60020,1311231851946, state=RS_ZK_REGION_CLOSED 2011-07-21 15:12:25,688 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x314b823b0a0000 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a 2011-07-21 15:12:25,688 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x314b823b0a0000 Successfully deleted unassigned node for region 32cb070f97e3b7b0d261f427a0fc110a in expected state RS_ZK_REGION_CLOSED 2011-07-21 15:12:26,383 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining) 2011-07-21 15:12:26,391 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=C4C2.site,60020,1311231851946, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. but we are not serving it for 32cb070f97e3b7b0d261f427a0fc110a 2011-07-21 15:15:32,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=PENDING_CLOSE, ts=1311232346385 2011-07-21 15:15:32,301 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. 2011-07-21 15:15:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Set watcher on existing znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a 2011-07-21 15:15:32,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining) 2011-07-21 15:15:32,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. but it is not currently assigned anywhere 2011-07-21 15:15:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. state=PENDING_CLOSE, ts=1311232346385 2011-07-21 15:15:42,300 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. 2011-07-21 15:15:42,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x314b823b0a0000 Set watcher on existing znode /hbase/unassigned/32cb070f97e3b7b0d261f427a0fc110a 2011-07-21 15:15:42,303 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdr5,0670286138,1311057525904.32cb070f97e3b7b0d261f427a0fc110a. (offlining) The log after modified code //I do this several times hbase(main):004:0> disable 'ufdr5' 11/07/21 15:42:52 DEBUG zookeeper.ZKUtil: hconnection-0x314b967b920003 Set watcher on existing znode /hbase/root-region-server 2011-07-21 15:50:21,443 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x1314b9678830000 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned 2011-07-21 15:50:21,571 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done=true 2011-07-21 15:51:26,908 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: C4C3.site,60020,1311233178997=0, C4C2.site,60020,1311233178981=1, C4C1.site,60020,1311233179039=1 2011-07-21 15:51:26,908 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing. servers=3 regions=2 average=0.6666667 mostloaded=1 leastloaded=0 2011-07-21 15:51:27,402 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 2156 catalog row(s) and gc'd 0 unreferenced parent region(s)
        Hide
        stack added a comment -

        And what did you see. Do you have log snippets that demonstrate the fix working? Thanks Gao.

        Show
        stack added a comment - And what did you see. Do you have log snippets that demonstrate the fix working? Thanks Gao.
        Hide
        gaojinchao added a comment -

        Hi, I verified the issue by adding a sleep in regionOffline. I think V2 is ok.

        below code:
        public void regionOffline(final HRegionInfo regionInfo) {
        synchronized(this.regionsInTransition) {
        if (this.regionsInTransition.remove(regionInfo.getEncodedName()) != null)

        { this.regionsInTransition.notifyAll(); }

        }
        try

        { Thread.sleep(1000); }

        catch(Throwable e)

        { ; }

        // remove the region plan as well just in case.
        clearRegionPlan(regionInfo);
        setOffline(regionInfo);
        }

        Show
        gaojinchao added a comment - Hi, I verified the issue by adding a sleep in regionOffline. I think V2 is ok. below code: public void regionOffline(final HRegionInfo regionInfo) { synchronized(this.regionsInTransition) { if (this.regionsInTransition.remove(regionInfo.getEncodedName()) != null) { this.regionsInTransition.notifyAll(); } } try { Thread.sleep(1000); } catch(Throwable e) { ; } // remove the region plan as well just in case. clearRegionPlan(regionInfo); setOffline(regionInfo); }
        Hide
        gaojinchao added a comment -

        @Stack:
        I will reproduce and verify it after finishing review. Because it may spend a lot of time.

        Show
        gaojinchao added a comment - @Stack: I will reproduce and verify it after finishing review. Because it may spend a lot of time.
        Hide
        stack added a comment -

        Gao v2 looks good to me too; do all tests pass and how do we know it works? How did you test it?

        Show
        stack added a comment - Gao v2 looks good to me too; do all tests pass and how do we know it works? How did you test it?
        Hide
        Ted Yu added a comment -

        +1 on the patch v2.

        Show
        Ted Yu added a comment - +1 on the patch v2.
        Hide
        Ted Yu added a comment -

        @Jinchao:
        Please generate the patch at root level, not under src directory

        Show
        Ted Yu added a comment - @Jinchao: Please generate the patch at root level, not under src directory
        Hide
        gaojinchao added a comment -

        I try to make a patch. if the region is in RIT, It shouldn't be unsigned again. So it seems changing the code position can solve this issue.
        ALL test passed, Please review and give some suggesion.

        Show
        gaojinchao added a comment - I try to make a patch. if the region is in RIT, It shouldn't be unsigned again. So it seems changing the code position can solve this issue. ALL test passed, Please review and give some suggesion.
        Hide
        stack added a comment -

        And it makes sense to me (Isn't there one already integral to zk?)

        Show
        stack added a comment - And it makes sense to me (Isn't there one already integral to zk?)
        Hide
        Ted Yu added a comment -

        It makes sense to me.

        Show
        Ted Yu added a comment - It makes sense to me.
        Hide
        Jieshan Bean added a comment -

        yes. is that ok?

        Show
        Jieshan Bean added a comment - yes. is that ok?
        Hide
        Ted Yu added a comment -

        @Jieshan:
        I think you're suggesting adding sequence Id to region open/close process so that an out-of-order process can abort its operation.

        Show
        Ted Yu added a comment - @Jieshan: I think you're suggesting adding sequence Id to region open/close process so that an out-of-order process can abort its operation.
        Hide
        Jieshan Bean added a comment -

        I have happened to several problems due to the two concorrent open/close process exected at the same time. The prev thread change the zk node state, but the latter one found the zk node was not the expect state, so the operation was fail.
        If we use a timestamp to identify the a open/close operation flow. So if a the prev flow was timeout, the TimeoutMonitor will restart a new flow and update the timestamp. So once the prev one executed unexpectly, it should check the timestamp first, if it has outdated, it should skip the operation.
        is it make any sense?

        Show
        Jieshan Bean added a comment - I have happened to several problems due to the two concorrent open/close process exected at the same time. The prev thread change the zk node state, but the latter one found the zk node was not the expect state, so the operation was fail. If we use a timestamp to identify the a open/close operation flow. So if a the prev flow was timeout, the TimeoutMonitor will restart a new flow and update the timestamp. So once the prev one executed unexpectly, it should check the timestamp first, if it has outdated, it should skip the operation. is it make any sense?
        Hide
        stack added a comment -

        Moving it out of 0.90.4 in the meantime.

        Show
        stack added a comment - Moving it out of 0.90.4 in the meantime.
        Hide
        Jieshan Bean added a comment -

        I'm trying to make the change about the patch.. But I think it should be more considerate..
        I try me best to commit the new patch today.
        Thanks stack.

        For the code format, sorry, I made the same mistake again, for our code format is not exact the same.

        Show
        Jieshan Bean added a comment - I'm trying to make the change about the patch.. But I think it should be more considerate.. I try me best to commit the new patch today. Thanks stack. For the code format, sorry , I made the same mistake again, for our code format is not exact the same.
        Hide
        stack added a comment -

        Any progress on this one Jieshan?

        Show
        stack added a comment - Any progress on this one Jieshan?
        Hide
        stack added a comment -

        Jieshan. Please follow the conventions used by the code around you. See how methods in the class add opening parenthesis on same line as function name rather than on line that follows, that we used two spaces for tabs, and that we have a space around operators – on both sides, not just on one.

        So, how is this 'fix' working? If timestamp is not -1, we'll remove the region from the in-memory regions in transition? It could be not -1 if we get a NotServingRegionException when we go to close the region. That seems good that we cleanup the in-memory RIT if we get an exception (though something else must be going on if we went to close a region on a server that was not carrying it). Same for the case of a generic Throwable. Though, are we sure we want to give up on the closing completely? What if a transient error? Are there cases where we'd want to retry the close?

        When would the state be -1?

        Thanks Jieshan.

        Show
        stack added a comment - Jieshan. Please follow the conventions used by the code around you. See how methods in the class add opening parenthesis on same line as function name rather than on line that follows, that we used two spaces for tabs, and that we have a space around operators – on both sides, not just on one. So, how is this 'fix' working? If timestamp is not -1, we'll remove the region from the in-memory regions in transition? It could be not -1 if we get a NotServingRegionException when we go to close the region. That seems good that we cleanup the in-memory RIT if we get an exception (though something else must be going on if we went to close a region on a server that was not carrying it). Same for the case of a generic Throwable. Though, are we sure we want to give up on the closing completely? What if a transient error? Are there cases where we'd want to retry the close? When would the state be -1? Thanks Jieshan.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jieshan Bean
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development