[SPARK-7398] Add back-pressure to Spark Streaming (umbrella JIRA) - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

Delete

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Incomplete
Affects Version/s: 1.3.1
Fix Version/s: None
Component/s: DStreams
Labels:
- bulk-closed
- streams

Description

Spark Streaming has trouble dealing with situations where
batch processing time > batch interval
Meaning a high throughput of input data w.r.t. Spark's ability to remove data from the queue.

If this throughput is sustained for long enough, it leads to an unstable situation where the memory of the Receiver's Executor is overflowed.

This aims at transmitting a back-pressure signal back to data ingestion to help with dealing with that high throughput, in a backwards-compatible way.

The original design doc can be found here:
https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing

The second design doc, focusing on the first sub-task (without all the background info, and more centered on the implementation) can be found here:
https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing