Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.8.0
Description
HBase startup continues to be problematic on RHEL7. (We recently beefed up our error checking with IMPALA-4684, but this didn't address the underlying flakiness.)
An example from a recent failure shows both Zookeeper flakiness (note the connection loss/retry) and a failure to launch HBase:
21:31:20 Connecting to Zookeeper host(s). 21:31:27 Success: <kazoo.client.KazooClient object at 0x13e2590> 21:31:27 Waiting for HBase node: /hbase/master 21:31:33 No handlers could be found for logger "kazoo.client" 21:31:33 Zookeeper connection loss: retrying connection (1 of 3 attempts) 21:31:34 Stopping Zookeeper client 21:31:39 Connecting to Zookeeper host(s). 21:31:46 Success: <kazoo.client.KazooClient object at 0x7f49109b3210> 21:31:46 Waiting for HBase node: /hbase/master 21:31:51 Waiting for HBase node: /hbase/master 21:31:55 Waiting for HBase node: /hbase/master 21:31:56 Waiting for HBase node: /hbase/master 21:32:04 Waiting for HBase node: /hbase/master 21:32:09 Waiting for HBase node: /hbase/master 21:32:28 Waiting for HBase node: /hbase/master 21:32:28 Waiting for HBase node: /hbase/master 21:32:28 Failed while checking for HBase node: /hbase/master 21:32:28 Waiting for HBase node: /hbase/rs 21:32:28 Success: /hbase/rs 21:32:28 Stopping Zookeeper client 21:32:28 Could not get one or more nodes. Exiting with errors: 1
When I look at the end of the output of the HBase master node, it's ends here:
17/01/04 21:32:17 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:51485, server: localhost/127.0.0.1:2181 17/01/04 21:32:23 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1596d1c12ee0009, negotiated timeout = 90000 17/01/04 21:32:23 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null 17/01/04 21:32:23 INFO master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/localhost,60000,1483594276304 from backup master directory
If we look at a successful HBase startup from another test run, the output looks like this:
16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:59932, server: localhost/127.0.0.1:2181 16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1594dfb07910002, negotiated timeout = 90000 16/12/29 20:27:00 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null 16/12/29 20:27:01 INFO util.FSUtils: Created version file at hdfs://localhost:20500/hbase with version=8 16/12/29 20:27:02 INFO master.MasterFileSystem: BOOTSTRAP: creating hbase:meta region 16/12/29 20:27:02 INFO regionserver.HRegion: creating HRegion hbase:meta HTD == 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'} RootDir = hdfs://localhost:20500/hbase Table name == hbase:meta 16/12/29 20:27:02 INFO hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=6566880, freeSize=6396457504, maxSize=6403024384, heapSize=6566880, minSize=6082873344, minFactor=0.95, multiSize=3041436672, multiFactor=0.5, singleSize=1520718336, singleFactor=0.25}, cacheDataOnRead=false, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false [etc...]
Attachments
Attachments
Issue Links
- relates to
-
IMPALA-4088 HDFS data nodes pick HTTP server ports at random, sometimes stealing HBase master's port
- Resolved