Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-738

Multilang needs Overflow-Control mechanism and HeartBeat timeout problem

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0, 0.9.3-rc2, 0.9.4, 1.0.0
    • Fix Version/s: None
    • Component/s: storm-multilang
    • Labels:
      None

      Description

      hi, all

      we have a topology, which have 3 components(spout->parser->saver) and the parser is Multilang bolt with python. We do not use ACK mechanism.

      we found 2 problems with Mutilang python script.
      1) the parser python scripts may hold too many tuples and consume too many memory;
      2) with MultiLang heartbeat mechanism described by https://issues.apache.org/jira/browse/STORM-513, the python script always timeout to heartbeat, even when the parser bolt is normal, cause supervisor to restart itself.

      ShellBolt process === Father-Process
      PythonScript process === Child-Process

      The reason is :
      1) when topology do not use ACK mechanism, the spout do not have Overflow-control ability, if the stream have too many tuples comes, spout will send all the tuples to parser's ShellBolt process(Father-Process);
      2) parser's ShellBolt process just put the tuples to _pendingWrites queue, if the _pendingWrites queue does not have limit;
      3) parser's PythonScript process(Child-Process) call readMsg() to read a tuple from STDIN, handle the tuple, and emit a new tuple to its father process through STDOUT, and then call readTaskIds() from STDIN. Because Father-Process's queue already have too many other tuples, Child-Process will read all the tuples to pending_commands, util received TaskIds.
      4) so Child-Process process's pending_commands may contains too many tuples and consume too many memory.

      As to heartbeat, because there are too many pending_commands need Child-Process to handle, and Child-Process's every emit operation will need more I/O read operations from STDIN. It may need 10 seconds to handle one tuple, and this will cause the heartbeat tuple not handle quickly, and timeout will happen.

      Even if Father-Process's _pendingWrites have limits, for example 1000, Child-Process may needs 1000 x 1000 read operations then it can handle the heartbeat tuple.

      Robert Joseph Evans Jungtaek Lim this related to Multilang and heartbeat, please help to confirm the two problems.

      I think Father-Process and Child-Process need Overflow-Control Protocol to control the python script's memory usage.
      And heartbeat tuple needs a separate queue(pending_heartbeats), and Child-Process handle heartbeat tuple at high priority. Jungtaek Lim wish to hear your opinion.

        Attachments

        1. storm_multilang.png
          62 kB
          DashengJu

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dashengju DashengJu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated: