Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7398

Add back-pressure to Spark Streaming (umbrella JIRA)

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.3.1
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:

      Description

      Spark Streaming has trouble dealing with situations where
      batch processing time > batch interval
      Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.

      If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.

      This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.

      The original design doc can be found here:
      https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing

      The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
      https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing

        Issue Links

          Activity

          Hide
          huitseeker François Garillot added a comment -

          Note this subsumes SPARK-6691, since dynamic throttling is only one of the options we implement (besides dropping, sampling, interfacing with a Reactive Streams-compliant source, and plug-your-own congestion strategy).

          Show
          huitseeker François Garillot added a comment - Note this subsumes SPARK-6691 , since dynamic throttling is only one of the options we implement (besides dropping, sampling, interfacing with a Reactive Streams -compliant source, and plug-your-own congestion strategy).
          Hide
          tdas Tathagata Das added a comment -

          I took a look at the whole design doc. Its very well composed, but the actual details on how the actual code changes is a little unclear. I Now that you have a working branch, I strongly recommend doing another additional design doc which skips all the intro and background, and just focuses on the code changes.

          Here is a design doc for inspiration. This is original design doc for the Write Ahead Log.
          https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit#heading=h.9xoxtbgz551y
          See the architecture and proposed implementation section. Accordingly you should have the following two sections

          1. Use diagrams to explain the high-level control flow in the architecture with new classes in the picture and how they interoperate/interface with existing classes (BTW, high-level = not as detailed at the control flow that you have in the earlier design doc).
          2. The details of every class and interface that needs to be introduced or modified. Especially focus on the interfaces for - (1) the heuristic algorithm, (2) the congestion control.

          This will allow me and others to evaluate the architecture more critically.

          Then if needed we can break up the task into smaller smaller sub-tasks (as done in the case of the WAL JIRA - https://issues.apache.org/jira/browse/SPARK-3129).

          Show
          tdas Tathagata Das added a comment - I took a look at the whole design doc. Its very well composed, but the actual details on how the actual code changes is a little unclear. I Now that you have a working branch, I strongly recommend doing another additional design doc which skips all the intro and background, and just focuses on the code changes. Here is a design doc for inspiration. This is original design doc for the Write Ahead Log. https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit#heading=h.9xoxtbgz551y See the architecture and proposed implementation section. Accordingly you should have the following two sections 1. Use diagrams to explain the high-level control flow in the architecture with new classes in the picture and how they interoperate/interface with existing classes (BTW, high-level = not as detailed at the control flow that you have in the earlier design doc). 2. The details of every class and interface that needs to be introduced or modified. Especially focus on the interfaces for - (1) the heuristic algorithm, (2) the congestion control. This will allow me and others to evaluate the architecture more critically. Then if needed we can break up the task into smaller smaller sub-tasks (as done in the case of the WAL JIRA - https://issues.apache.org/jira/browse/SPARK-3129 ).
          Hide
          dragos Iulian Dragos added a comment -

          Tathagata Das thanks for looking into this. I agree that now that we have working code we can be more specific about what changes are needed. We'll prepare another doc with that focus in mind.

          Show
          dragos Iulian Dragos added a comment - Tathagata Das thanks for looking into this. I agree that now that we have working code we can be more specific about what changes are needed. We'll prepare another doc with that focus in mind.
          Hide
          tdas Tathagata Das added a comment -

          Great looking forward to it.

          On Mon, Jun 22, 2015 at 2:27 AM, Iulian Dragos (JIRA) <jira@apache.org>

          Show
          tdas Tathagata Das added a comment - Great looking forward to it. On Mon, Jun 22, 2015 at 2:27 AM, Iulian Dragos (JIRA) <jira@apache.org>
          Show
          dragos Iulian Dragos added a comment - Tathagata Das here it is: https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing
          Hide
          tdas Tathagata Das added a comment -

          Thank you very much! This is much cleaner and easy to understand. Here are
          the preliminary thoughts.

          1. I strongly recommend introducing an interface for making the heuristic
          algorithm pluggable.
          2. Should there be separate instances of the heuristic algorithm running
          for separate input DStreams. That part is not clear in the doc. Different
          input stream may want to calculate their rates differently.
          3. The calculated rate from the heuristic algorithm should be communicated
          through the input Dstream interface. This is to allow both receiver and
          non-receiver input DStreams to handle calculate rates differently.
          4. The CongestionListener interface is a good-to-have in the future but not
          the most critical feature. So I suggest this be done later, in future
          iterations.
          5. There should not be any new dependencies in the streaming module.

          The rest of the details are commented in the doc.

          On Mon, Jun 29, 2015 at 2:22 AM, Iulian Dragos (JIRA) <jira@apache.org>

          Show
          tdas Tathagata Das added a comment - Thank you very much! This is much cleaner and easy to understand. Here are the preliminary thoughts. 1. I strongly recommend introducing an interface for making the heuristic algorithm pluggable. 2. Should there be separate instances of the heuristic algorithm running for separate input DStreams. That part is not clear in the doc. Different input stream may want to calculate their rates differently. 3. The calculated rate from the heuristic algorithm should be communicated through the input Dstream interface. This is to allow both receiver and non-receiver input DStreams to handle calculate rates differently. 4. The CongestionListener interface is a good-to-have in the future but not the most critical feature. So I suggest this be done later, in future iterations. 5. There should not be any new dependencies in the streaming module. The rest of the details are commented in the doc. On Mon, Jun 29, 2015 at 2:22 AM, Iulian Dragos (JIRA) <jira@apache.org>
          Hide
          dragos Iulian Dragos added a comment -

          Document updated.

          Show
          dragos Iulian Dragos added a comment - Document updated.
          Hide
          nmarasoi nicu marasoiu added a comment -

          Hi!

          Can I help in any way? Back pressure is fundamental to implementing a reactive pipeline with spark streaming.
          I see that there is a single task which is not resolved, is it a clear thing that I can take maybe?

          Thank you,
          Nicu

          Show
          nmarasoi nicu marasoiu added a comment - Hi! Can I help in any way? Back pressure is fundamental to implementing a reactive pipeline with spark streaming. I see that there is a single task which is not resolved, is it a clear thing that I can take maybe? Thank you, Nicu
          Hide
          dragos Iulian Dragos added a comment -

          Hey, except the last point, everything is available in 1.5.

          You can go ahead and tackle the remaining ticket, of course.

          Show
          dragos Iulian Dragos added a comment - Hey, except the last point, everything is available in 1.5. You can go ahead and tackle the remaining ticket, of course.

            People

            • Assignee:
              tdas Tathagata Das
              Reporter:
              huitseeker François Garillot
              Shepherd:
              Tathagata Das
            • Votes:
              14 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

              • Created:
                Updated:

                Development