Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4733

HBase/Zookeeper continues to be flaky when starting the minicluster on RHEL7

    XMLWordPrintableJSON

Details

    Description

      HBase startup continues to be problematic on RHEL7. (We recently beefed up our error checking with IMPALA-4684, but this didn't address the underlying flakiness.)

      An example from a recent failure shows both Zookeeper flakiness (note the connection loss/retry) and a failure to launch HBase:

      21:31:20 Connecting to Zookeeper host(s).
      21:31:27 Success: <kazoo.client.KazooClient object at 0x13e2590>
      21:31:27 Waiting for HBase node: /hbase/master
      21:31:33 No handlers could be found for logger "kazoo.client"
      21:31:33 Zookeeper connection loss: retrying connection (1 of 3 attempts)
      21:31:34 Stopping Zookeeper client
      21:31:39 Connecting to Zookeeper host(s).
      21:31:46 Success: <kazoo.client.KazooClient object at 0x7f49109b3210>
      21:31:46 Waiting for HBase node: /hbase/master
      21:31:51 Waiting for HBase node: /hbase/master
      21:31:55 Waiting for HBase node: /hbase/master
      21:31:56 Waiting for HBase node: /hbase/master
      21:32:04 Waiting for HBase node: /hbase/master
      21:32:09 Waiting for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/master
      21:32:28 Failed while checking for HBase node: /hbase/master
      21:32:28 Waiting for HBase node: /hbase/rs
      21:32:28 Success: /hbase/rs
      21:32:28 Stopping Zookeeper client
      21:32:28 Could not get one or more nodes. Exiting with errors: 1
      

      When I look at the end of the output of the HBase master node, it's ends here:

      17/01/04 21:32:17 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:51485, server: localhost/127.0.0.1:2181
      17/01/04 21:32:23 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1596d1c12ee0009, negotiated timeout = 90000
      17/01/04 21:32:23 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
      17/01/04 21:32:23 INFO master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/localhost,60000,1483594276304 from backup master directory
      

      If we look at a successful HBase startup from another test run, the output looks like this:

      16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:59932, server: localhost/127.0.0.1:2181
      16/12/29 20:27:00 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1594dfb07910002, negotiated timeout = 90000
      16/12/29 20:27:00 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
      16/12/29 20:27:01 INFO util.FSUtils: Created version file at hdfs://localhost:20500/hbase with version=8
      16/12/29 20:27:02 INFO master.MasterFileSystem: BOOTSTRAP: creating hbase:meta region
      16/12/29 20:27:02 INFO regionserver.HRegion: creating HRegion hbase:meta HTD == 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'} RootDir = hdfs://localhost:20500/hbase Table name == hbase:meta
      16/12/29 20:27:02 INFO hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=6566880, freeSize=6396457504, maxSize=6403024384, heapSize=6566880, minSize=6082873344, minFactor=0.95, multiSize=3041436672, multiFactor=0.5, singleSize=1520718336, singleFactor=0.25}, cacheDataOnRead=false, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
      [etc...]
      

      Attachments

        1. hbase_logs.tgz
          50 kB
          David Knupp

        Issue Links

          Activity

            People

              lv Lars Volker
              dknupp David Knupp
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: