Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
0.90.4
-
None
-
None
Description
It goes like this:
- Master balances region R from server A to server B
- B reports OPENED of R
- Master queues a OpenedRegionHandler for R on B
- B dies (sad)
- Master processes the splits and reassigns R to C
- C reports OPENED of R
- Master queues a OpenedRegionHandler for R on C
- OpenedRegionHandler for R on B is finally processed, but leaves the region in a weird state (log #1)
- OpenedRegionHandler for R on C is processed, fails when it tries to delete the znode, kills the master (log #2)
If the master didn't commit seppuku, it would have had a wrong view of the state of that region because it wouldn't link it to the region server that really opened it in the end.
I'm not sure how I would go fixing this though...
Log 1:
2011-09-15 01:57:47,430 INFO org.apache.hadoop.hbase.master.AssignmentManager: The server is not in online servers, ServerName=sv4r23s44,60020,1316076811761, region=6e22b45f4288ea4d73f612ccf111aea6
2011-09-15 01:57:47,430 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 6e22b45f4288ea4d73f612ccf111aea6 on sv4r23s44,60020,1316076811761
Log 2:
2011-09-15 01:58:10,171 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x231d7b21aba0480 Deleting existing unassigned node for 6e22b45f4288ea4d73f612ccf111aea6 that is in expected state RS_ZK_REGION_OPENED
2011-09-15 01:58:10,204 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x231d7b21aba0480 Unable to get data of znode /hbase/unassigned/6e22b45f4288ea4d73f612ccf111aea6 because node does not exist (not necessarily an error)
2011-09-15 01:58:10,204 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting OPENED node in ZK for transition ZK node (6e22b45f4288ea4d73f612ccf111aea6)