Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-261

Storm Supervisors Fail to Start

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Done
    • Minor
    • Resolution: Done
    • None
    • 0.2.1BETA

    Description

      After deployment completes, the Storm Supervisors often fail to start correctly. This prevents any data from being ingested until the Supervisors are manually started.

      It appears that the Supervisors fail to communicate with Zookeeper and they timeout and die. Zookeeper may just not be ready in time. Not sure if this is something we can fix or if this is an Ambari issue.

      2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_40]
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_40]
      at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181) and timeout (15000) / elapsed (15053)
      org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
      at org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?]
      at backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?]
      at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.daemon.supervisor$fn_8449$exec_fn3614auto___8450.invoke(supervisor.clj:504) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?]
      at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
      at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?]
      at backtype.storm.daemon.supervisor$fn_8449$mk_supervisor_8476.doInvoke(supervisor.clj:500) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?]
      at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]
      at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?]
      at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?]
      at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4]

      Attachments

        Activity

          People

            Unassigned Unassigned
            nickwallen Nick Allen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: