Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2150

ShellBolt raise subprocess heartbeat timeout Exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.0.1, 1.0.2
    • Fix Version/s: None
    • Component/s: storm-core, storm-multilang
    • Labels:
      None

      Description

      I've got a simple topology running with Storm 1.0.1. The topology consists of a KafkaSpout and several python multilang ShellBolt. I frequently got the following exceptions.

      java.lang.RuntimeException: subprocess heartbeat timeout at org.apache.storm.task.ShellBolt$BoltHeartbeatTimerTask.run(ShellBolt.java:322) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
      

      More information here:
      1. Topology run with ACK mode.
      2. Topology had 40 workers.
      3. Topology emitted about 10 milliom tuples every 10 minutes.

      Every time subprocess heartbeat timeout, workers would restart and python processes exited with exitCode:-1, which affected processing capacity and stability of the topology.

      I've checked some related issues from Storm Jira. I first found STORM-1946 reported a bug related to this problem and said bug had been fixed in Storm 1.0.2. However I got the same exception even after I upgraded Storm to 1.0.2.

      I checked other related issues. Let's look at history of this problem.
      DashengJu first reported this problem with Non-ACK mode in STORM-738. STORM-742 discussed the approach of this problem with ACK mode, and it seemed that bug had been fixed in 0.10.0. I don't know whether this patch is included in storm-1.x branch. In a word, this problem still exists in the latest stable version.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shawshank Ma Zhechao
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: