Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3039

Ports of killed topologies remain in TIME_WAIT state preventing to start new topology

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.2, 1.2.1
    • 2.0.0, 1.1.3, 1.2.2
    • None

    Description

      When topology is killed the slot ports (supervisor.slots.ports) remain in TIME_WAIT state. In that case new topology can not be started, because workers throw the following error:

      2018-04-20 08:37:08.742 o.a.s.d.worker main [ERROR] Error on initialization of server mk-worker
      org.apache.storm.shade.org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:6700
       at org.apache.storm.shade.org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.messaging.netty.Server.<init>(Server.java:101) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.messaging.netty.Context.bind(Context.java:67) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.daemon.worker$worker_data$fn__10395.invoke(worker.clj:285) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.util$assoc_apply_self.invoke(util.clj:931) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.daemon.worker$worker_data.invoke(worker.clj:282) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.daemon.worker$fn__10693$exec_fn__3301__auto__$reify__10695.run(worker.clj:626) ~[storm-core-1.2.1.jar:1.2.1]
       at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
       at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_161]
       at org.apache.storm.daemon.worker$fn__10693$exec_fn__3301__auto____10694.invoke(worker.clj:624) ~[storm-core-1.2.1.jar:1.2.1]
       at clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.7.0.jar:?]
       at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?]
       at clojure.core$apply.invoke(core.clj:630) ~[clojure-1.7.0.jar:?]
       at org.apache.storm.daemon.worker$fn__10693$mk_worker__10784.doInvoke(worker.clj:598) [storm-core-1.2.1.jar:1.2.1]
       at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.7.0.jar:?]
       at org.apache.storm.daemon.worker$_main.invoke(worker.clj:787) [storm-core-1.2.1.jar:1.2.1]
       at clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.7.0.jar:?]
       at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.7.0.jar:?]
       at org.apache.storm.daemon.worker.main(Unknown Source) [storm-core-1.2.1.jar:1.2.1]
      Caused by: java.net.BindException: Address already in use
       at sun.nio.ch.Net.bind0(Native Method) ~[?:1.8.0_161]
       at sun.nio.ch.Net.bind(Net.java:433) ~[?:1.8.0_161]
       at sun.nio.ch.Net.bind(Net.java:425) ~[?:1.8.0_161]
       at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) ~[?:1.8.0_161]
       at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) ~[?:1.8.0_161]
       at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[storm-core-1.2.1.jar:1.2.1]
       at org.apache.storm.shade.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[storm-core-1.2.1.jar:1.2.1]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
       at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]
      
      

       

      This exception occurs often when topologies stopped and started automatically.

      Attachments

        Issue Links

          Activity

            People

              ghajos Gergely Hajós
              ghajos Gergely Hajós
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h