HBase
  1. HBase
  2. HBASE-5816

Balancer and ServerShutdownHandler concurrently reassign the same region

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.90.6
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into "assign" and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort.

      In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously.

      2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
      2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining)
      2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
      2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
      2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186
      2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
      2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
      2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253
      2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253
      2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
      java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020]
              at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
              at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
              at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
              at $Proxy7.openRegion(Unknown Source)
              at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
              at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
              at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020]
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
              at java.io.FilterInputStream.read(FilterInputStream.java:116)
              at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:301)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              at java.io.DataInputStream.readInt(DataInputStream.java:370)
              at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:541)
              at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:479)
      2012-04-17 05:54:59,035 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. so generated a random one; hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=, dest=hadoop06.sh.intel.com,60020,1334544901894; 7 (online=7, exclude=serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)) available servers
      2012-04-17 05:54:59,035 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
      2012-04-17 05:54:59,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=, dest=hadoop06.sh.intel.com,60020,1334544901894
      2012-04-17 05:54:59,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to hadoop06.sh.intel.com,60020,1334544901894
      2012-04-17 05:54:59,046 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613299045, server=hadoop06.sh.intel.com,60020,1334544901894
      java.lang.IllegalStateException
              at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1167)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1107)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
              at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:259)
              at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      2012-04-17 05:54:59,047 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
      
      1. HBASE-5816.patch
        0.8 kB
        Maryann Xue

        Issue Links

          Activity

          Hide
          Jean-Daniel Cryans added a comment -

          Aborting patch available status, the conversation is stale.

          Show
          Jean-Daniel Cryans added a comment - Aborting patch available status, the conversation is stale.
          Hide
          Jimmy Xiang added a comment -

          @Stack, sure. In 0.94, assigning the same region concurrently can easily lead to this issue. The assign method is synchronized on the region state. Before going to the synchronized assign method, the region is moved to OFFLINE state and added into RIT if it is not already. If the region is already in transition, the region state is not changed (if not hijack, which is used by timeout monitor only). Once going into the synchronized assign method, AM tries to set the region offline in ZK. However, the region state is PENDING_OPEN/OPENING instead of offline in this case, so the master aborts.

          In trunk, it is different:

                RegionState state = forceRegionStateToOffline(region, forceNewPlan);
                if (state != null) {
                  assign(state, setOfflineInZK, forceNewPlan);
                }
          

          The forceRegionStateToOffline returns null if the region is already in transition, so we won't assign it again, so we won't get into the problem.

          As to assigning the region to a dead server, during the assign attempts, a new plan will be used.
          As to assigning by SSH with forceNewPlan = true, forceRegionStateToOffline will abort the previous assignment if still assigning, close the region if already assigned.
          All these assignment calls are synchronized on the region.
          Region state change by ZK event thread is also synchronized on the region.

          That's why I think we are good with the trunk branch.

          Show
          Jimmy Xiang added a comment - @Stack, sure. In 0.94, assigning the same region concurrently can easily lead to this issue. The assign method is synchronized on the region state. Before going to the synchronized assign method, the region is moved to OFFLINE state and added into RIT if it is not already. If the region is already in transition, the region state is not changed (if not hijack, which is used by timeout monitor only). Once going into the synchronized assign method, AM tries to set the region offline in ZK. However, the region state is PENDING_OPEN/OPENING instead of offline in this case, so the master aborts. In trunk, it is different: RegionState state = forceRegionStateToOffline(region, forceNewPlan); if (state != null) { assign(state, setOfflineInZK, forceNewPlan); } The forceRegionStateToOffline returns null if the region is already in transition, so we won't assign it again, so we won't get into the problem. As to assigning the region to a dead server, during the assign attempts, a new plan will be used. As to assigning by SSH with forceNewPlan = true, forceRegionStateToOffline will abort the previous assignment if still assigning, close the region if already assigned. All these assignment calls are synchronized on the region. Region state change by ZK event thread is also synchronized on the region. That's why I think we are good with the trunk branch.
          Hide
          stack added a comment -

          Jimmy Xiang Can you say more why you think it not an issue so the likes of myself (or Maryann) can be more confident closing this issue? Thanks boss.

          Show
          stack added a comment - Jimmy Xiang Can you say more why you think it not an issue so the likes of myself (or Maryann) can be more confident closing this issue? Thanks boss.
          Hide
          Jimmy Xiang added a comment -

          Looked into it. I think it is not an issue in the trunk branch any more. The root cause is that SSH uses bulk assigning which doesn't do proper locking, and this is fixed in trunk. I think we can close it now.

          Show
          Jimmy Xiang added a comment - Looked into it. I think it is not an issue in the trunk branch any more. The root cause is that SSH uses bulk assigning which doesn't do proper locking, and this is fixed in trunk. I think we can close it now.
          Hide
          Jimmy Xiang added a comment -

          In the trunk branch, it should be fine to reassign the same region concurrently. It may be hard to reproduce this issue. I'd like to give it a try.

          Show
          Jimmy Xiang added a comment - In the trunk branch, it should be fine to reassign the same region concurrently. It may be hard to reproduce this issue. I'd like to give it a try.
          Hide
          Maryann Xue added a comment -

          Since this is not fully addressed in HBASE-6060, how about test/reproduce it against Jimmy's fix?

          Show
          Maryann Xue added a comment - Since this is not fully addressed in HBASE-6060 , how about test/reproduce it against Jimmy's fix?
          Hide
          stack added a comment -

          Shall we close this as now invalid? Or, rather, we need to reproduce it on top of Jimmy's new refactor. Or, should we keep this open and try and make a unit test to reproduce what MaryAnn saw here?

          Show
          stack added a comment - Shall we close this as now invalid? Or, rather, we need to reproduce it on top of Jimmy's new refactor. Or, should we keep this open and try and make a unit test to reproduce what MaryAnn saw here?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jimmy
          Yes, you are correct. We are trying to solve such problems in HBASE-6060.

          Show
          ramkrishna.s.vasudevan added a comment - @Jimmy Yes, you are correct. We are trying to solve such problems in HBASE-6060 .
          Hide
          Jimmy Xiang added a comment -

          I agree with Stack. This patch just moves the problem somewhere else.
          Each assign() starts with force-offline the region state, if there are already
          assign thread going on behind, the later assign thread could mess up the region
          state transition.

          Show
          Jimmy Xiang added a comment - I agree with Stack. This patch just moves the problem somewhere else. Each assign() starts with force-offline the region state, if there are already assign thread going on behind, the later assign thread could mess up the region state transition.
          Hide
          Jimmy Xiang added a comment -

          Trunk has bulk assignment. 0.90.6 ServerShutdownHandler assigns region one by one.

          With bulk assignment, do we still need to pull regions to assign from a shared queue?

          Should we fix 0.90.6 ServerShutdownHandler to do something similar to bulk assignment?

          Show
          Jimmy Xiang added a comment - Trunk has bulk assignment. 0.90.6 ServerShutdownHandler assigns region one by one. With bulk assignment, do we still need to pull regions to assign from a shared queue? Should we fix 0.90.6 ServerShutdownHandler to do something similar to bulk assignment?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Maryann
          If you are planning to work on this pls go ahead.

          Show
          ramkrishna.s.vasudevan added a comment - @Maryann If you are planning to work on this pls go ahead.
          Hide
          stack added a comment -

          @Ram I think the aim would be a simplification; one queue to assign from rather than from multiple. Also as is, I think state a little distributed across multiple variables and maps. We should coalesce if possible. I think the Maryann suggestion of trying a double or triple concurrent assign in a unit test a good start.

          Show
          stack added a comment - @Ram I think the aim would be a simplification; one queue to assign from rather than from multiple. Also as is, I think state a little distributed across multiple variables and maps. We should coalesce if possible. I think the Maryann suggestion of trying a double or triple concurrent assign in a unit test a good start.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I thought more on this.. There are already some state management datastructures like RIT, regions and servers map. So adding more may again be redundant(my thought). and handling them should be clean.

          Show
          ramkrishna.s.vasudevan added a comment - I thought more on this.. There are already some state management datastructures like RIT, regions and servers map. So adding more may again be redundant(my thought). and handling them should be clean.
          Hide
          Maryann Xue added a comment -

          @stack, @ramkrishna Should be able to reproduce problem 2 in trunk in unit test by initiating concurrent assign() from two threads.

          Show
          Maryann Xue added a comment - @stack, @ramkrishna Should be able to reproduce problem 2 in trunk in unit test by initiating concurrent assign() from two threads.
          Hide
          stack added a comment -

          @Maryann Agree on your 1., and 2. above. Its possible to make a standalone AssignmentManager using mocks – see TestAssignmentManager. Maybe we should try some of your suppositions over in unit tests Maryann and find holes in AM by writing unit tests?

          Show
          stack added a comment - @Maryann Agree on your 1., and 2. above. Its possible to make a standalone AssignmentManager using mocks – see TestAssignmentManager. Maybe we should try some of your suppositions over in unit tests Maryann and find holes in AM by writing unit tests?
          Hide
          ramkrishna.s.vasudevan added a comment - - edited

          In 0.94 i tried to reproduce i was not able to get it. But note that the HBASE-5396 fix is not there in it. Will try tomorrow.

          Show
          ramkrishna.s.vasudevan added a comment - - edited In 0.94 i tried to reproduce i was not able to get it. But note that the HBASE-5396 fix is not there in it. Will try tomorrow.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Stack
          In our internal version of 0.90 which has changes as in 0.92 along with Timeout monitor related changes we did what ever Maryann did. It works. But we had timeout monitor to rescue.
          As you said its better to have some datastructure like a set or queue so that we need not allow any new assign to happen if something is present in the datastructure that we have added.
          But we should be very careful as when to clear it as we have some retry attempts also made when assign fails.

          Show
          ramkrishna.s.vasudevan added a comment - @Stack In our internal version of 0.90 which has changes as in 0.92 along with Timeout monitor related changes we did what ever Maryann did. It works. But we had timeout monitor to rescue. As you said its better to have some datastructure like a set or queue so that we need not allow any new assign to happen if something is present in the datastructure that we have added. But we should be very careful as when to clear it as we have some retry attempts also made when assign fails.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I will try to dig in this more. The code that Maryann says is in 0.90 only. Some subtle changes are there between 0.90 and 0.92+ version regarding assignments.
          So its better we investigate the bug in 0.92 also seperately.

          Show
          ramkrishna.s.vasudevan added a comment - I will try to dig in this more. The code that Maryann says is in 0.90 only. Some subtle changes are there between 0.90 and 0.92+ version regarding assignments. So its better we investigate the bug in 0.92 also seperately.
          Hide
          Maryann Xue added a comment -

          Yes, Zhihong, not in trunk now. It is only in 0.90 branch.

          I think there are two different problems here.
          1. The ServerShutdownHandler should check more strictly before calling assign() for regions "planned" to move to it.
          2. Master should handle concurrent requests of assign() more properly. In the case i attached, the later assign() call by ServerShutdownHandler could actually be ignored, since AssignmentManager is already doing assignment job for this region. There could be situations when client code calls assign() from interface level and cause the same problem.

          True, doing dead server check is not safe, for the private assign() method does a number of retry attempts in its own loop, and the server could be found dead just after the first attempt fails.

          stack, i think maintaining a queue for assign() requests, whether from the balancer or the ServerShutdownHandler or client calls, can be a good solution. However, if the AssignmentManager is now good in its own logic among region states and regionsInTransition, it should be justified to assume that if assign gets a wrong state, it simply indicates there is an another assign that has just succeeded. So it should be ok for an "invalid/later" assign to return.

          Show
          Maryann Xue added a comment - Yes, Zhihong, not in trunk now. It is only in 0.90 branch. I think there are two different problems here. 1. The ServerShutdownHandler should check more strictly before calling assign() for regions "planned" to move to it. 2. Master should handle concurrent requests of assign() more properly. In the case i attached, the later assign() call by ServerShutdownHandler could actually be ignored, since AssignmentManager is already doing assignment job for this region. There could be situations when client code calls assign() from interface level and cause the same problem. True, doing dead server check is not safe, for the private assign() method does a number of retry attempts in its own loop, and the server could be found dead just after the first attempt fails. stack, i think maintaining a queue for assign() requests, whether from the balancer or the ServerShutdownHandler or client calls, can be a good solution. However, if the AssignmentManager is now good in its own logic among region states and regionsInTransition, it should be justified to assume that if assign gets a wrong state, it simply indicates there is an another assign that has just succeeded. So it should be ok for an "invalid/later" assign to return.
          Hide
          stack added a comment -

          Should we have the servershutdownhandler and the balancer feed a single queue that assignment manager pulls from? If the region is already in the queue then we'd favor the purposed assignment (the balancers?) rather than the random one?

          Show
          stack added a comment - Should we have the servershutdownhandler and the balancer feed a single queue that assignment manager pulls from? If the region is already in the queue then we'd favor the purposed assignment (the balancers?) rather than the random one?
          Hide
          stack added a comment -

          Great stuff Maryann. Where is the above bit of code from? I don't find it in trunk (could be me).

          It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread.

          What are you thinking? When we go into the assign, we check if the region is in transition and unless its a force assign, just return? Or would you do this earlier? Maybe the balancer should be more deferential? It could check if the regionserver its been asked move a region from is on the deadservers list. This would still be racy though. Would doing the check in the assign method be enough? (I've not looked at the code).

          Thanks for the help on this stuff.

          Show
          stack added a comment - Great stuff Maryann. Where is the above bit of code from? I don't find it in trunk (could be me). It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread. What are you thinking? When we go into the assign, we check if the region is in transition and unless its a force assign, just return? Or would you do this earlier? Maybe the balancer should be more deferential? It could check if the regionserver its been asked move a region from is on the deadservers list. This would still be racy though. Would doing the check in the assign method be enough? (I've not looked at the code). Thanks for the help on this stuff.
          Hide
          Ted Yu added a comment -

          HBASE-5396 was only integrated to 0.90
          So it shouldn't be the cause for problem in trunk.

          Show
          Ted Yu added a comment - HBASE-5396 was only integrated to 0.90 So it shouldn't be the cause for problem in trunk.
          Hide
          Maryann Xue added a comment - - edited

          stack, your suggestion seems an ultimate solution to the current HMaster workflow. trunk has the same problem.
          The case i attached were actually introduced by HBASE-5396, which tried to let ServerShutdownHandler assign the region in an earlier stage instead of waiting for the TimeoutMonitor to do the job. But the isRegionOnline test seems too weak here.

                for (HRegionInfo hri : regionsFromRegionPlansForServer) {
                  if (!this.services.getAssignmentManager().isRegionOnline(hri)) {
                    this.services.getAssignmentManager().assign(hri, true);
                    reassignedPlans++;
                  }
                }
          

          However, i think any client call to HBaseAdmin.assign() that coincide at this point would cause the same problem. There is a lock guarding the private assign() method to deal with concurrent assigns, but the entire assign process is not atomic. It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread.

          Show
          Maryann Xue added a comment - - edited stack, your suggestion seems an ultimate solution to the current HMaster workflow. trunk has the same problem. The case i attached were actually introduced by HBASE-5396 , which tried to let ServerShutdownHandler assign the region in an earlier stage instead of waiting for the TimeoutMonitor to do the job. But the isRegionOnline test seems too weak here. for (HRegionInfo hri : regionsFromRegionPlansForServer) { if (! this .services.getAssignmentManager().isRegionOnline(hri)) { this .services.getAssignmentManager().assign(hri, true ); reassignedPlans++; } } However, i think any client call to HBaseAdmin.assign() that coincide at this point would cause the same problem. There is a lock guarding the private assign() method to deal with concurrent assigns, but the entire assign process is not atomic. It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread.
          Hide
          Maryann Xue added a comment -

          stack, your suggestion seems an ultimate solution to the current HMaster workflow. trunk has the same problem.
          The case i attached were actually introduced by HBASE-5396, which tried to let ServerShutdownHandler assign the region in an earlier stage instead of waiting for the TimeoutMonitor to do the job. But the isRegionOnline test seems too weak here.
          for (HRegionInfo hri : regionsFromRegionPlansForServer) {
          if (!this.services.getAssignmentManager().isRegionOnline(hri))

          { this.services.getAssignmentManager().assign(hri, true); reassignedPlans++; }

          }

          However, i think any client call to HBaseAdmin.assign() that coincide at this point would cause the same problem. There is a lock guarding the private assign() method to deal with concurrent assigns, but the entire assign process is not atomic. It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread.

          Show
          Maryann Xue added a comment - stack, your suggestion seems an ultimate solution to the current HMaster workflow. trunk has the same problem. The case i attached were actually introduced by HBASE-5396 , which tried to let ServerShutdownHandler assign the region in an earlier stage instead of waiting for the TimeoutMonitor to do the job. But the isRegionOnline test seems too weak here. for (HRegionInfo hri : regionsFromRegionPlansForServer) { if (!this.services.getAssignmentManager().isRegionOnline(hri)) { this.services.getAssignmentManager().assign(hri, true); reassignedPlans++; } } However, i think any client call to HBaseAdmin.assign() that coincide at this point would cause the same problem. There is a lock guarding the private assign() method to deal with concurrent assigns, but the entire assign process is not atomic. It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread.
          Hide
          stack added a comment -

          Changing title and upping the priority. At a minimum the balancer and shutdown handler should populate a queue that a single assignment process works on rather than as we have now, two threads doing independent assigns or balancer should not move regions that are currently in RIT or regions that were moved recently, etc.

          Show
          stack added a comment - Changing title and upping the priority. At a minimum the balancer and shutdown handler should populate a queue that a single assignment process works on rather than as we have now, two threads doing independent assigns or balancer should not move regions that are currently in RIT or regions that were moved recently, etc.
          Hide
          stack added a comment -

          Thanks for filing the issue Maryann. I think we need to address the root problem of two threads in the master both at the same time trying to assign the same region rather than do as is done here where we just stop the abort. The patch as is will only move the problem down the line (we'll likely end up w/ a single region double assigned?). Let me update the issue title. This log snippet is a really good find.

          Show
          stack added a comment - Thanks for filing the issue Maryann. I think we need to address the root problem of two threads in the master both at the same time trying to assign the same region rather than do as is done here where we just stop the abort. The patch as is will only move the problem down the line (we'll likely end up w/ a single region double assigned?). Let me update the issue title. This log snippet is a really good find.
          Hide
          Uma Maheswara Rao G added a comment -

          Maryann, This looks to be an issue in trunk also right?

           if (!hijack && !state.isClosed() && !state.isOffline()) {
                String msg = "Unexpected state : " + state + " .. Cannot transit it to OFFLINE.";
                this.master.abort(msg, new IllegalStateException(msg));
                return -1;
              }
          

          Since region already assigned by previos call, state might have changed to inRegionTransition. If we just log an return now, i think it will just skip this assignment. I think it may be ok.

          Show
          Uma Maheswara Rao G added a comment - Maryann, This looks to be an issue in trunk also right? if (!hijack && !state.isClosed() && !state.isOffline()) { String msg = "Unexpected state : " + state + " .. Cannot transit it to OFFLINE." ; this .master.abort(msg, new IllegalStateException(msg)); return -1; } Since region already assigned by previos call, state might have changed to inRegionTransition. If we just log an return now, i think it will just skip this assignment. I think it may be ok.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12523068/HBASE-5816.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1565//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523068/HBASE-5816.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1565//console This message is automatically generated.
          Hide
          Maryann Xue added a comment -

          Replace abort with LOG.WARN.

          Show
          Maryann Xue added a comment - Replace abort with LOG.WARN.

            People

            • Assignee:
              ramkrishna.s.vasudevan
              Reporter:
              Maryann Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Development