Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7398

Add back-pressure to Spark Streaming (umbrella JIRA)

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 1.3.1
    • None
    • DStreams

    Description

      Spark Streaming has trouble dealing with situations where
      batch processing time > batch interval
      Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.

      If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.

      This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.

      The original design doc can be found here:
      https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing

      The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
      https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tdas Tathagata Das
            huitseeker Fran├žois Garillot
            Tathagata Das Tathagata Das
            Votes:
            14 Vote for this issue
            Watchers:
            37 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment