Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7398

Add back-pressure to Spark Streaming (umbrella JIRA)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 1.3.1
    • None
    • DStreams

    Description

      Spark Streaming has trouble dealing with situations where
      batch processing time > batch interval
      Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.

      If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.

      This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.

      The original design doc can be found here:
      https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing

      The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
      https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              tdas Tathagata Das
              huitseeker François Garillot
              Tathagata Das Tathagata Das
              Votes:
              14 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: