Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-513

ShellBolt keeps sending heartbeats even when child process is hung

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.9.2-incubating
    • Fix Version/s: 0.9.3-rc2
    • Component/s: storm-multilang
    • Labels:
      None
    • Environment:
      Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)

      Description

      If I'm understanding everything correctly with how ShellBolts work, the Java ShellBolt executor is the part of the topology that sends heartbeats back to Nimbus to let it know that a particular multilang bolt is still alive. The problem with this is that if the multilang subprocess/bolt severely hangs (i.e., it will not even respond to SIGALRM and the like), the Java ShellBolt does not seem to notice or care. Simply having the tuple get replayed when it times out will not suffice either, because the subprocess will still be stuck.

      The most obvious way to handle this seem to be to add heartbeating to the multilang protocol itself, so that the ShellBolt expects a message of some kind every timeout seconds.

        Attachments

          Activity

            People

            • Assignee:
              kabhwan Jungtaek Lim
              Reporter:
              dan.blanchard Dan Blanchard
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: