Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4733

HBase/Zookeeper continues to be flaky when starting the minicluster on RHEL7

    Details

      Description

      HBase startup continues to be problematic on RHEL7. (We recently beefed up our error checking with IMPALA-4684, but this didn't address the underlying flakiness.)

      An example from a recent failure shows both Zookeeper flakiness (note the connection loss/retry) and a failure to launch HBase:

      21:31:20 Connecting to Zookeeper host(s).
      21:31:27 Success: <kazoo.client.KazooClient object at 0x13e2590>
      21:31:27 Waiting for HBase node: /hbase/master
      21:31:33 No handlers could be found for logger "kazoo.client"
      21:31:33 Zookeeper connection loss: retrying connection (1 of 3 attempts)
      21:31:34 Stopping Zookeeper client
      21:31:39 Connecting to Zookeeper host(s).
      21:31:46 Success: <kazoo.client.KazooClient object at 0x7f49109b3210>
      21:31:46 Waiting for HBase node: /hbase/master
      21:31:51 Waiting for HBase node: /hbase/master
      21:31:55 Waiting for HBase node: /hbase/master
      21:31:56 Waiting for HBase node: /hbase/master
      21:32:04 Waiting for HBase node: /hbase/master
      21:32:09 Waiting for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/master
      21:32:28 Failed while checking for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/rs
      21:32:28 Success: /hbase/rs
      21:32:28 Stopping Zookeeper client
      21:32:28 Could not get one or more nodes. Exiting with errors: 1
      

      When I look at the end of the output of the HBase master node, it's ends here:

      17/01/04 21:32:17 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:51485, server: localhost/127.0.0.1:2181
      17/01/04 21:32:23 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1596d1c12ee0009, negotiated timeout = 90000
      17/01/04 21:32:23 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
      17/01/04 21:32:23 INFO master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/localhost,60000,1483594276304 from backup master directory
      

      If we look at a successful HBase startup from another test run, the output looks like this:

      16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:59932, server: localhost/127.0.0.1:2181
      16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1594dfb07910002, negotiated timeout = 90000
      16/12/29 20:27:00 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
      16/12/29 20:27:01 INFO util.FSUtils: Created version file at hdfs://localhost:20500/hbase with version=8
      16/12/29 20:27:02 INFO master.MasterFileSystem: BOOTSTRAP: creating hbase:meta region
      16/12/29 20:27:02 INFO regionserver.HRegion: creating HRegion hbase:meta HTD == 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'} RootDir = hdfs://localhost:20500/hbase Table name == hbase:meta
      16/12/29 20:27:02 INFO hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=6566880, freeSize=6396457504, maxSize=6403024384, heapSize=6566880, minSize=6082873344, minFactor=0.95, multiSize=3041436672, multiFactor=0.5, singleSize=1520718336, singleFactor=0.25}, cacheDataOnRead=false, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      [etc...]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lv Lars Volker
                Reporter:
                dknupp David Knupp
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: