[SPARK-6691] Abstract and add a dynamic RateLimiter for Spark Streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: DStreams
Labels:
None

Description

Flow control (or rate control) for input data is very important in streaming system, especially for Spark Streaming to keep stable and up-to-date. The unexpected flood of incoming data or the high ingestion rate of input data which beyond the computation power of cluster will make the system unstable and increase the delay time. For Spark Streaming’s job generation and processing pattern, this delay will be accumulated and introduce unacceptable exceptions.

Currently in Spark Streaming’s receiver based input stream, there’s a RateLimiter in BlockGenerator which controls the ingestion rate of input data, but the current implementation has several limitations:

The max ingestion rate is set by user through configuration beforehand, user may lack the experience of how to set an appropriate value before the application is running.
This configuration is fixed through the life-time of application, which means you need to consider the worst scenario to set a reasonable configuration.
Input stream like DirectKafkaInputStream need to maintain another solution to achieve the same functionality.
Lack of slow start control makes the whole system easily trapped into large processing and scheduling delay at the very beginning.

So here we propose a new dynamic RateLimiter as well as the new interface for the RateLimiter to better improve the whole system's stability. The target is:

Dynamically adjust the ingestion rate according to processing rate of previous finished jobs.
Offer an uniform solution not only for receiver based input stream, but also for direct stream like DirectKafkaInputStream and new ones.
Slow start rate to control the network congestion when job is started.
Pluggable framework to make the maintenance of extension more easy.

Here is the design doc (https://docs.google.com/document/d/1lqJDkOYDh_9hRLQRwqvBXcbLScWPmMa7MlG8J_TE93w/edit?usp=sharing) and working branch (https://github.com/jerryshao/apache-spark/tree/dynamic-rate-limiter).

Any comment would be greatly appreciated.

Attachments

Issue Links

is superceded by

SPARK-8834 Throttle DStreams dynamically through back-pressure information

Closed

SPARK-7398 Add back-pressure to Spark Streaming (umbrella JIRA)

Resolved

links to

[Github] Pull Request #5385 (jerryshao)

Activity

People

Assignee:: Unassigned

Reporter:: Saisai Shao

Votes:: 3 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 03/Apr/15 06:20

Updated:: 27/Jul/15 21:15

Resolved:: 27/Jul/15 21:07