Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-840

My supervisor crashes when I kill a topology

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.9.4
    • None
    • storm-core
    • I have a test cluster of 3 servers base on Debian.
      Each server use a docker running storm inside.

      2 servers are only supervisor.
      1 server is nimbus+UI+supervisor.

      I use Oracle JVM 8u45.
    • Important

    Description

      Hello,
      I run 3 topologies inside my cluster.
      Sometimes, when I kill one of them (not one specific). One supervisor goes down and restart. After few restart, it become stable.
      The topology process is in "Zombie state" in the process list.

      In version 0.9.3, all the supervisors crashed and couldn't restart. To resolve this, I had to "rm -fr <storm-local-dir>/workers/"
      So I migrate to 0.9.4 (I thought that was STORM-682).

      Now it continues but no all the times, but occasionally.

      I have these logs inside supervisor.log:
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Removing code for storm id nlp-11-1432906756
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #

      {[2 2] [3 3] [-1 -1] [1 1]}, :port 6700}
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down and clearing state for id 355af307-fafc-43a8-865d-0dfbf9baee33. Current supervisor time: 1432911702. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1432911702, :storm-id "nlp-11-1432906756", :executors #{[2 2] [3 3] [-1 -1] [1 1]}

      , :port 6700}
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down 90f0964b-c48c-4cbc-9d1c-57119c56e99c:355af307-fafc-43a8-865d-0dfbf9baee33
      2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
      java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
      at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
      at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
      at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
      at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
      at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
      at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
      at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
      at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      Caused by: java.io.IOException: error=2, No such file or directory
      at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
      at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
      ... 19 common frames omitted
      2015-05-29 15:01:42 b.s.event [ERROR] Error when processing event
      java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[na:1.8.0_45]
      at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_45]
      at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[commons-exec-1.1.jar:1.1]
      at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[commons-exec-1.1.jar:1.1]
      at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.util$send_signal_to_process.invoke(util.clj:415) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.util$kill_process_with_sig_term.invoke(util.clj:426) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.daemon.supervisor$shutdown_worker.invoke(supervisor.clj:197) ~[storm-core-0.9.4.jar:0.9.4]
      at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:267) ~[storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
      at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
      at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
      at clojure.core$partial$fn__4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
      at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
      at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40) ~[storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      Caused by: java.io.IOException: error=2, No such file or directory
      at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.8.0_45]
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) ~[na:1.8.0_45]
      at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[na:1.8.0_45]
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[na:1.8.0_45]
      ... 19 common frames omitted
      2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
      java.lang.RuntimeException: ("Error when processing an event")
      at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
      at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      2015-05-29 15:01:42 b.s.util [ERROR] Halting process: ("Error when processing an event")
      java.lang.RuntimeException: ("Error when processing an event")
      at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
      at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48) [storm-core-0.9.4.jar:0.9.4]
      at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
      2015-05-29 15:01:42 b.s.d.supervisor [INFO] Shutting down supervisor 90f0964b-c48c-4cbc-9d1c-57119c56e99c
      2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
      2015-05-29 15:01:42 b.s.event [INFO] Event manager interrupted
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:host.name=storm-supervisor-01
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.version=1.8.0_45
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.home=/usr/lib/jvm/jre-8-oracle-x64/jre
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.class.path=/usr/share/apache-storm-0.9.4/lib/zookeeper-3.4.6.jar:/usr/share/apache-storm-0.9.4/lib/hiccup-0.3.6.jar:/usr/share/apache-storm-0.9.4/lib/chill-java-0.3.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-exec-1.1.jar:/usr/share/apache-storm-0.9.4/lib/tools.macro-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/jgrapht-core-0.9.0.jar:/usr/share/apache-storm-0.9.4/lib/ring-servlet-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/clout-1.0.1.jar:/usr/share/apache-storm-0.9.4/lib/storm-core-0.9.4.jar:/usr/share/apache-storm-0.9.4/lib/asm-4.0.jar:/usr/share/apache-storm-0.9.4/lib/tools.cli-0.2.4.jar:/usr/share/apache-storm-0.9.4/lib/disruptor-2.10.1.jar:/usr/share/apache-storm-0.9.4/lib/log4j-over-slf4j-1.6.6.jar:/usr/share/apache-storm-0.9.4/lib/clj-time-0.4.1.jar:/usr/share/apache-storm-0.9.4/lib/slf4j-api-1.7.5.jar:/usr/share/apache-storm-0.9.4/lib/clojure-1.5.1.jar:/usr/share/apache-storm-0.9.4/lib/core.incubator-0.1.0.jar:/usr/share/apache-storm-0.9.4/lib/json-simple-1.1.jar:/usr/share/apache-storm-0.9.4/lib/logback-classic-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/servlet-api-2.5.jar:/usr/share/apache-storm-0.9.4/lib/logback-core-1.0.13.jar:/usr/share/apache-storm-0.9.4/lib/jetty-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/clj-stacktrace-0.2.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-devel-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/minlog-1.2.jar:/usr/share/apache-storm-0.9.4/lib/kryo-2.21.jar:/usr/share/apache-storm-0.9.4/lib/compojure-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/commons-codec-1.6.jar:/usr/share/apache-storm-0.9.4/lib/tools.logging-0.2.3.jar:/usr/share/apache-storm-0.9.4/lib/ring-jetty-adapter-0.3.11.jar:/usr/share/apache-storm-0.9.4/lib/jetty-util-6.1.26.jar:/usr/share/apache-storm-0.9.4/lib/joda-time-2.0.jar:/usr/share/apache-storm-0.9.4/lib/jline-2.11.jar:/usr/share/apache-storm-0.9.4/lib/commons-logging-1.1.3.jar:/usr/share/apache-storm-0.9.4/lib/reflectasm-1.07-shaded.jar:/usr/share/apache-storm-0.9.4/lib/carbonite-1.4.0.jar:/usr/share/apache-storm-0.9.4/lib/snakeyaml-1.11.jar:/usr/share/apache-storm-0.9.4/lib/objenesis-1.2.jar:/usr/share/apache-storm-0.9.4/lib/ring-core-1.1.5.jar:/usr/share/apache-storm-0.9.4/lib/commons-io-2.4.jar:/usr/share/apache-storm-0.9.4/lib/commons-fileupload-1.2.1.jar:/usr/share/apache-storm-0.9.4/lib/math.numeric-tower-0.0.1.jar:/usr/share/apache-storm-0.9.4/lib/commons-lang-2.5.jar:/usr/share/apache-storm-0.9.4/conf
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.name=Linux
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.arch=amd64
      2015-05-29 15:01:53 o.a.s.z.ZooKeeper [INFO] Client environment:os.version=3.16.0-0.bpo.4-amd64
      ...

      Attachments

        Activity

          People

            Unassigned Unassigned
            zaide Damien DESMARETS
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: