Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.8.0
-
ghx-label-5
Description
There were several failures on RHEL7 recently.
Starting cluster services... Stopping kudu Stopping kms Stopping yarn Stopping hdfs Starting hdfs (Web UI - http://localhost:5070) Namenode started Starting yarn (Web UI - http://localhost:8088) Starting kms (Web UI - http://localhost:9600) Starting kudu (Web UI - http://localhost:8051) The cluster is running --> Starting HBase HBase start attempt: 1/5 Killing any HBase processes possibly lingering from previous start attempts Stopping Hbase localhost: ------------------------------------------------------------------------------- localhost: # This image has been automatically created with ComodIT - http://comodit.com # localhost: ------------------------------------------------------------------------------- localhost: localhost: starting zookeeper, logging to /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/logs/cluster/hbase/hbase-jenkins-zookeeper-impala-boost-static-burst-slave-el7-0a08.vpc.cloudera.com.out starting master, logging to /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/logs/cluster/hbase/hbase-jenkins-master-impala-boost-static-burst-slave-el7-0a08.vpc.cloudera.com.out starting regionserver, logging to /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/logs/cluster/hbase/hbase-jenkins-1-regionserver-impala-boost-static-burst-slave-el7-0a08.vpc.cloudera.com.out starting regionserver, logging to /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/logs/cluster/hbase/hbase-jenkins-2-regionserver-impala-boost-static-burst-slave-el7-0a08.vpc.cloudera.com.out starting regionserver, logging to /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/logs/cluster/hbase/hbase-jenkins-3-regionserver-impala-boost-static-burst-slave-el7-0a08.vpc.cloudera.com.out Contents of HDFS root: [] Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f5aecf06e10> Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Failed while checking for HBase node: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs Stopping Zookeeper client Could not get one or more nodes. Exiting with errors: 1 Error in /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/testdata/bin/run-hbase.sh at line 129: ${CLUSTER_BIN}/check-hbase-nodes.py Error in /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/repos/Impala/testdata/bin/run-all.sh at line 51: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-hbase.log + echo 'buildall.sh ' -format '-snapshot_file /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/testdata/test-warehouse-SNAPSHOT/test-warehouse-cdh5-304-SNAPSHOT.tar.gz' '-metastore_snapshot_file /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/testdata/hive_metastore_snapshot/hive_impala_dump_cdh5-304.txt failed.' buildall.sh -format -snapshot_file /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/testdata/test-warehouse-SNAPSHOT/test-warehouse-cdh5-304-SNAPSHOT.tar.gz -metastore_snapshot_file /data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/testdata/hive_metastore_snapshot/hive_impala_dump_cdh5-304.txt failed. + RET_VAL=1
From the HBase master log:
17/04/17 21:32:40 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /0:0:0:0:0:0:0:1:48978, server: localhost/0:0:0:0:0:0:0:1:2181 17/04/17 21:32:46 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15b7f5567b10000, negotiated timeout = 90000 17/04/17 21:33:34 INFO ipc.RpcServer: RpcServer.responder: starting 17/04/17 21:33:34 INFO ipc.RpcServer: RpcServer.listener,port=16000: starting 17/04/17 21:33:35 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 17/04/17 21:33:35 INFO http.HttpRequestLog: Http request log for http.requests.master is not defined 17/04/17 21:33:35 INFO http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 17/04/17 21:33:35 INFO http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter) 17/04/17 21:33:35 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context master 17/04/17 21:33:35 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 17/04/17 21:33:35 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 17/04/17 21:33:35 INFO http.HttpServer: Jetty bound to port 16010 17/04/17 21:33:35 INFO mortbay.log: jetty-6.1.26.cloudera.4 17/04/17 21:33:35 INFO mortbay.log: Extract jar:file:/data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.12.0-SNAPSHOT/lib/hbase-server-1.2.0-cdh5.12.0-SNAPSHOT.jar!/hbase-webapps/master to /tmp/Jetty_0_0_0_0_16010_master____.6nvknp/webapp 17/04/17 21:33:35 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:16010 17/04/17 21:33:35 INFO master.HMaster: hbase.rootdir=hdfs://localhost:20500/hbase, hbase.cluster.distributed=true 17/04/17 21:33:35 INFO master.HMaster: Adding backup master ZNode /hbase/backup-masters/localhost,16000,1492489958254
Here's what a successful HBase looks like:
17/04/16 21:32:26 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /0:0:0:0:0:0:0:1:52928, server: localhost/0:0:0:0:0:0:0:1:2181 17/04/16 21:32:28 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15b7a2ed6860000, negotiated timeout = 90000 17/04/16 21:32:28 INFO ipc.RpcServer: RpcServer.responder: starting 17/04/16 21:32:28 INFO ipc.RpcServer: RpcServer.listener,port=16000: starting 17/04/16 21:32:28 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 17/04/16 21:32:28 INFO http.HttpRequestLog: Http request log for http.requests.master is not defined 17/04/16 21:32:28 INFO http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 17/04/16 21:32:28 INFO http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter) 17/04/16 21:32:28 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context master 17/04/16 21:32:28 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 17/04/16 21:32:28 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 17/04/16 21:32:28 INFO http.HttpServer: Jetty bound to port 16010 17/04/16 21:32:28 INFO mortbay.log: jetty-6.1.26.cloudera.4 17/04/16 21:32:28 INFO mortbay.log: Extract jar:file:/data/jenkins/workspace/impala-asf-master-exhaustive-rhel7/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.12.0-SNAPSHOT/lib/hbase-server-1.2.0-cdh5.12.0-SNAPSHOT.jar!/hbase-webapps/master to /tmp/Jetty_0_0_0_0_16010_master____.6nvknp/webapp 17/04/16 21:32:29 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:16010 17/04/16 21:32:29 INFO master.HMaster: hbase.rootdir=hdfs://localhost:20500/hbase, hbase.cluster.distributed=true 17/04/16 21:32:29 INFO master.HMaster: Adding backup master ZNode /hbase/backup-masters/localhost,16000,1492403544599 17/04/16 21:32:29 INFO master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/localhost,16000,1492403544599 from backup master directory 17/04/16 21:32:29 INFO master.ActiveMasterManager: Registered Active Master=localhost,16000,1492403544599 17/04/16 21:32:29 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 17/04/16 21:32:29 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x112a19e connecting to ZooKeeper ensemble=localhost:2181 17/04/16 21:32:29 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x112a19e0x0, quorum=localhost:2181, baseZNode=/hbase 17/04/16 21:32:29 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 17/04/16 21:32:29 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /0:0:0:0:0:0:0:1:52933, server: localhost/0:0:0:0:0:0:0:1:2181 17/04/16 21:32:29 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15b7a2ed6860005, negotiated timeout = 90000 17/04/16 21:32:29 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null 17/04/16 21:32:30 INFO util.FSUtils: Created version file at hdfs://localhost:20500/hbase with version=8 17/04/16 21:32:31 INFO master.MasterFileSystem: BOOTSTRAP: creating hbase:meta region 17/04/16 21:32:31 INFO regionserver.HRegion: creating HRegion hbase:meta HTD == 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL => 'FOREVER', MIN_VERSIONS => '0', CACHE_DATA_IN_L1 => 'true', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'} RootDir = hdfs://localhost:20500/hbase Table name == hbase:meta 17/04/16 21:32:31 INFO hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=6559152, freeSize=6388915792, maxSize=6395474944, heapSize=6559152, minSize=6075701248, minFactor=0.95, multiSize=3037850624, multiFactor=0.5, singleSize=1518925312, singleFactor=0.25}, cacheDataOnRead=false, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 17/04/16 21:32:31 INFO compactions.CompactionConfiguration: size [134217728, 9223372036854775807, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000 17/04/16 21:32:31 INFO regionserver.HRegion: Onlined 1588230740; next sequenceid=2 17/04/16 21:32:31 INFO regionserver.HStore: Closed info 17/04/16 21:32:31 INFO regionserver.HRegion: Closed hbase:meta,,1.1588230740 17/04/16 21:32:31 INFO fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks 17/04/16 21:32:31 INFO coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes 17/04/16 21:32:31 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6c9e069e connecting to ZooKeeper ensemble=localhost:2181 17/04/16 21:32:31 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6c9e069e0x0, quorum=localhost:2181, baseZNode=/hbase 17/04/16 21:32:31 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 17/04/16 21:32:31 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /0:0:0:0:0:0:0:1:52946, server: localhost/0:0:0:0:0:0:0:1:2181 17/04/16 21:32:31 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15b7a2ed6860009, negotiated timeout = 90000 17/04/16 21:32:31 INFO balancer.StochasticLoadBalancer: loading config 17/04/16 21:32:32 INFO master.HMaster: Server active/primary master=localhost,16000,1492403544599, sessionid=0x15b7a2ed6860000, setting cluster-up flag (Was=false) 17/04/16 21:32:32 INFO regionserver.HRegionServer: ClusterId : 4e7d30ae-2e8f-4ede-8ec8-f62245b8c9f9 17/04/16 21:32:32 INFO procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort 17/04/16 21:32:32 INFO procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort 17/04/16 21:32:32 INFO master.MasterCoprocessorHost: System coprocessor loading is enabled 17/04/16 21:32:32 INFO procedure2.ProcedureExecutor: Starting procedure executor threads=9 17/04/16 21:32:32 INFO wal.WALProcedureStore: Starting WAL Procedure Store lease recovery 17/04/16 21:32:32 WARN wal.WALProcedureStore: Log directory not found: File hdfs://localhost:20500/hbase/MasterProcWALs does not exist. 17/04/16 21:32:32 INFO wal.WALProcedureStore: Lease acquired for flushLogId: 1 17/04/16 21:32:32 INFO zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=localhost:2181 17/04/16 21:32:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=localhost:2181, baseZNode=/hbase 17/04/16 21:32:32 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 17/04/16 21:32:32 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /0:0:0:0:0:0:0:1:52949, server: localhost/0:0:0:0:0:0:0:1:2181 17/04/16 21:32:32 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x15b7a2ed686000a, negotiated timeout = 90000 17/04/16 21:32:32 INFO master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 17/04/16 21:32:32 INFO hbase.Server: Auth successful for jenkins (auth:SIMPLE) 17/04/16 21:32:32 INFO hbase.Server: Auth successful for jenkins (auth:SIMPLE) 17/04/16 21:32:32 INFO hbase.Server: Auth successful for jenkins (auth:SIMPLE) 17/04/16 21:32:32 INFO hbase.Server: Connection from 127.0.0.1 port: 43413 with version info: version: "1.2.0-cdh5.12.0-SNAPSHOT" url: "file:///data/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5-Packaging-HBase-2017-03-02_15-47-59/hbase-1.2.0-cdh5.12.0-SNAPSHOT" revision: "Unknown" user: "jenkins" date: "Thu Mar 2 16:13:05 PST 2017" src_checksum: "9ea697f985b9a5740a14d1be418c1115" 17/04/16 21:32:32 INFO hbase.Server: Connection from 127.0.0.1 port: 35012 with version info: version: "1.2.0-cdh5.12.0-SNAPSHOT" url: "file:///data/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5-Packaging-HBase-2017-03-02_15-47-59/hbase-1.2.0-cdh5.12.0-SNAPSHOT" revision: "Unknown" user: "jenkins" date: "Thu Mar 2 16:13:05 PST 2017" src_checksum: "9ea697f985b9a5740a14d1be418c1115" 17/04/16 21:32:32 INFO hbase.Server: Connection from 127.0.0.1 port: 59011 with version info: version: "1.2.0-cdh5.12.0-SNAPSHOT" url: "file:///data/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5-Packaging-HBase-2017-03-02_15-47-59/hbase-1.2.0-cdh5.12.0-SNAPSHOT" revision: "Unknown" user: "jenkins" date: "Thu Mar 2 16:13:05 PST 2017" src_checksum: "9ea697f985b9a5740a14d1be418c1115" 17/04/16 21:32:32 INFO master.ServerManager: Registering server=localhost,16203,1492403547967 17/04/16 21:32:32 INFO master.ServerManager: Registering server=localhost,16201,1492403545540 17/04/16 21:32:32 INFO master.ServerManager: Registering server=localhost,16202,1492403546904 17/04/16 21:32:32 INFO master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 385 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 17/04/16 21:32:34 INFO master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 1895 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 17/04/16 21:32:35 INFO master.ServerManager: Waiting for region servers count to settle; currently checked in 3, slept for 3398 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 17/04/16 21:32:36 INFO master.ServerManager: Finished waiting for region servers count to settle; checked in 3, slept for 4501 ms, expecting minimum of 1, maximum of 2147483647, master is running 17/04/16 21:32:36 INFO master.MasterFileSystem: Log folder hdfs://localhost:20500/hbase/WALs/localhost,16201,1492403545540 belongs to an existing region server 17/04/16 21:32:36 INFO master.MasterFileSystem: Log folder hdfs://localhost:20500/hbase/WALs/localhost,16202,1492403546904 belongs to an existing region server 17/04/16 21:32:36 INFO master.MasterFileSystem: Log folder hdfs://localhost:20500/hbase/WALs/localhost,16203,1492403547967 belongs to an existing region server 17/04/16 21:32:37 INFO zookeeper.MetaTableLocator: Deleting hbase:meta region location in ZooKeeper 17/04/16 21:32:37 INFO master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} 17/04/16 21:32:37 INFO zookeeper.ZKTableStateManager: Moving table hbase:meta state from null to ENABLED 17/04/16 21:32:37 INFO master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to localhost,16201,1492403545540 17/04/16 21:32:37 INFO master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1492403557593, server=null} to {1588230740 state=PENDING_OPEN, ts=1492403557616, server=localhost,16201,1492403545540}
This issue is similar to IMPALA-4733.