Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7398

Add back-pressure to Spark Streaming (umbrella JIRA)

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.3.1
    • Fix Version/s: None
    • Component/s: DStreams
    • Labels:

      Description

      Spark Streaming has trouble dealing with situations where
      batch processing time > batch interval
      Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.

      If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.

      This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.

      The original design doc can be found here:
      https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing

      The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
      https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tdas Tathagata Das
                Reporter:
                huitseeker Fran├žois Garillot
                Shepherd:
                Tathagata Das
              • Votes:
                14 Vote for this issue
                Watchers:
                37 Start watching this issue

                Dates

                • Created:
                  Updated: