Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1021

HeartbeatExecutorService issue in Shell Spout\Bolt

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0, 0.9.5
    • None
    • storm-multilang
    • Alpine Linux 3.2, openjdk-7
    • Important

    Description

      ShellSpout class (and it seems that this touches ShellBolt as well) doesn't restart when hearbeat timeout occurs. To reproduce this bug you should do the following:
      1. Set supervisor.worker.timeout.secs property to e.g. 1;
      2. Create a shell spout (as a standalone application, not Java class) that hangs for more than 1 second and doesn't respond on heartbeat messages, e.g. Thread.Sleep(5000);
      3. After timeout Storm will try to kill the shell spout process with calling die function:
      https://github.com/apache/storm/blob/v0.10.0-beta/storm-core/src/jvm/backtype/storm/spout/ShellSpout.java#L237
      4. The "die" function will call heartBeatExecutorService.shutdownNow() function that raises InterruptedException, which is not caughted by the calling thread. In a result topology stops working properly, however you may see it in ./storm list.

      I'm not Java developer and thus I'm not sure whether code below is valid, however it seems to fix the problem:

      private void die(Throwable exception) {
      heartBeatExecutorService.shutdownNow();
      try

      { heartBeatExecutorService.awaitTermination(5, TimeUnit.SECONDS); }

      catch (InterruptedException e)

      { LOG.error("await catch ", e); }

      _collector.reportError(exception);
      _process.destroy();
      System.exit(11);
      }

      Attachments

        Activity

          People

            Unassigned Unassigned
            ohord Oleh Hordiichuk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: