Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1949

Backpressure can cause spout to stop emitting and stall topology

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Problem can be reproduced by this Word count topology
      within a IDE.
      I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt instances.

      The problem is more easily reproduced with WC topology as it causes an explosion of tuples due to splitting a sentence tuple into word tuples. As the bolts have to process more tuples than the spout is producing, spout needs to operate slower.

      The amount of time it takes for the topology to stall can vary.. but typically under 10 mins.

      My theory: I suspect there is a race condition in the way ZK is being utilized to enable/disable back pressure. When congested (i.e pressure exceeds high water mark), the bolt's worker records this congested situation in ZK by creating a node. Once the congestion is reduced below the low water mark, it deletes this node.
      The spout's worker has setup a watch on the parent node, expecting a callback whenever there is change in the child nodes. On receiving the callback the spout's worker lists the parent node to check if there are 0 or more child nodes.... it is essentially trying to figure out the nature of state change in ZK to determine whether to throttle or not. Subsequently it setsup another watch in ZK to keep an eye on future changes.

      When there are multiple bolts, there can be rapid creation/deletion of these ZK nodes. Between the time the worker receives a callback and sets up the next watch.. many changes may have undergone in ZK which will go unnoticed by the spout.

      The condition that the bolts are no longer congested may not get noticed as a result of this.

      Attachments

        1. 1.x-branch-works-perfect.png
          428 kB
          Zhuo Liu
        2. wordcounttopo.zip
          5 kB
          Roshan Naik

        Issue Links

          Activity

            People

              abellina Alessandro Bellina
              roshan_naik Roshan Naik
              Votes:
              2 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated: