Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2706

Nimbus stuck in exception and does not fail fast

    XMLWordPrintableJSON

Details

    Description

      We experience a problem in nimbus which leads it to get stuck in a retry and fail loop. When I manually restart the nimbus it works again as expected. However, it would be great if nimbus would shut down so our monitoring can automatically restart the nimbus.

      The nimbus log.

      24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
      24.8.2017 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:19	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:19	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
      24.8.2017 15:39:19	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
      24.8.2017 15:39:19	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
      24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - Unexpected throwable while invoking!
      24.8.2017 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /storm/leader-lock
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[?:?]
      24.8.2017 15:39:27	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) ~[clojure-1.7.0.jar:?]
      24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) ~[storm-core-1.1.1.jar:1.1.1]
      24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
      24.8.2017 15:39:27	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
      24.8.2017 15:39:27	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
      24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:4913:39:49.271 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:4913:39:49.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping assignments
      24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, skipping cleanup
      24.8.2017 15:40:1313:40:13.806 [timer] INFO  org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
      24.8.2017 15:40:1313:40:13.807 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354
      24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to authenticate using SASL (unknown error)
      24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection established to 10.42.174.214/10.42.174.214:2181, initiating session
      24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid = 0x15e14456dc70045, negotiated timeout = 20000
      24.8.2017 15:40:1313:40:13.910 [timer] INFO  org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: 0x15e14456dc70045 closed
      24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO  org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down
      

      Attachments

        Activity

          People

            srdo Stig Rohde Døssing
            bijanfahimi Bijan Fahimi Shemrani
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m