Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2706

Nimbus stuck in exception and does not fail fast

    Details

      Description

      We experience a problem in nimbus which leads it to get stuck in a retry and fail loop. When I manually restart the nimbus it works again as expected. However, it would be great if nimbus would shut down so our monitoring can automatically restart the nimbus.

      The nimbus log.

      24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
      24.8.2017 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
      24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
      24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
      24.8.2017 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
      24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
      24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:4913:39:49.271 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:4913:39:49.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:40:1313:40:13.806 [timer] INFO  org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
      24.8.2017 15:40:1313:40:13.807 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354
      24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to authenticate using SASL (unknown error)
      24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection established to 10.42.174.214/10.42.174.214:2181, initiating session
      24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid = 0x15e14456dc70045, negotiated timeout = 20000
      24.8.2017 15:40:1313:40:13.910 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: 0x15e14456dc70045 closed
      24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down
      

        Attachments

          Activity

            People

            • Assignee:
              Srdo Stig Rohde Døssing
              Reporter:
              bijanfahimi Bijan Fahimi Shemrani
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m