Details
-
Sub-task
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
- When a region server attempts to open a region and fails it takes the resp. znode to PENDING_OPEN followed by FAILED_OPEN in quick succession.
- The HMaster now gets two notifications from ZK.
- If the znode transitioned to FAILED_OPEN before the HMaster could react to PENDING_OPEN. There will be two ClosedRegionHandler running.
That races causes this condition:
java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)