Description
After deployment completes, the Storm Supervisors often fail to start correctly. This prevents any data from being ingested until the Supervisors are manually started.
It appears that the Supervisors fail to communicate with Zookeeper and they timeout and die. Zookeeper may just not be ready in time. Not sure if this is something we can fix or if this is an Ambari issue.
2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_40]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_40]
at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181) and timeout (15000) / elapsed (15053)
org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?]
at backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.daemon.supervisor$fn_8449$exec_fn3614auto___8450.invoke(supervisor.clj:504) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?]
at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor$fn_8449$mk_supervisor_8476.doInvoke(supervisor.clj:500) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?]
at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]