Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.2.0
-
None
Description
Two Alter table calls one of TTL and the other for setting the Region policies could result in a possible race condition on HBase side.
Here is some details:
The assignment operation on the region in question (2e13a2d264c50e27c32cd4d3cad6ebca) further indicates possible network partition - the region is CLOSED and OFFLINED successfully, and trying to OPEN again, but it found conflict information from zk node (the zk node had a stale information and still thought that it is still in the CLOSING state, which is not true based on the log) - the assignment succeeds, but updating the assignment information to zk failed due to incorrect version in the zk node.
2016-02-18 00:05:46,603 INFO [os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,52093,1455753905161-org.apache.hadoop.hbase.master.BulkReOpen-0] master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=OPEN, ts=1455753935244, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to {2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_CLOSE, ts=1455753946603, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} 2016-02-18 00:05:46,605 INFO [PriorityRpcServer.handler=16,queue=0,port=41497] regionserver.RSRpcServices: Close 2e13a2d264c50e27c32cd4d3cad6ebca, moving to null 2016-02-18 00:05:46,629 INFO [StoreCloserThread-METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.-1] regionserver.HStore: Closed 0 2016-02-18 00:05:46,635 INFO [RS_CLOSE_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-1] regionserver.HRegion: Closed METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. 2016-02-18 00:05:46,643 INFO [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_CLOSE, ts=1455753946603, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to {2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} 2016-02-18 00:05:46,644 INFO [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => '', ENDKEY => ''} 2016-02-18 00:05:46,644 INFO [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to {2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} 2016-02-18 00:05:46,646 INFO [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Assigning METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723 2016-02-18 00:05:46,646 INFO [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to {2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_OPEN, ts=1455753946646, server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} 2016-02-18 00:05:46,647 INFO [PriorityRpcServer.handler=5,queue=1,port=41497] regionserver.RSRpcServices: Open METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. 2016-02-18 00:05:46,647 INFO [PriorityRpcServer.handler=5,queue=1,port=41497] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => '', ENDKEY => ''} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 2 2016-02-18 00:05:46,649 WARN [PriorityRpcServer.handler=5,queue=1,port=41497] regionserver.RSRpcServices: Failed opening region METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: Received OPEN for the region:METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. , which we are already trying to CLOSE at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1477) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) 2016-02-18 00:05:46,653 WARN [AM.ZK.Worker-pool3-t1] master.AssignmentManager: Failed assignment of METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723, waiting a little before trying on the same region server try=1 of 10 ... 2016-02-18 00:05:46,771 WARN [RS_OPEN_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-0] coordination.ZkOpenRegionCoordination: Unable to mark region {ENCODED => 2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => '', ENDKEY => ''} as FAILED_OPEN. It's likely that the master already timed out this open attempt, and thus another RS already has the region. 016-02-18 00:05:47,038 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x152f1b1bbcb0014, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-02-18 00:05:47,039 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:51677 which had sessionid 0x152f1b1bbcb0014
Attachments
Attachments
Issue Links
- links to