[SPARK-24815] Structured Streaming should support dynamic allocation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.3.1
Fix Version/s: None
Component/s: Scheduler, Spark Core, Structured Streaming
Labels:
- pull-request-available

Description

For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.

However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.

Quick thoughts:

1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).

2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.

3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled

Attachments

Issue Links

links to

GitHub Pull Request #42352

Activity

People

Assignee:: Unassigned

Reporter:: Karthik Palaniappan

Votes:: 12 Vote for this issue

Watchers:: 29 Start watching this issue

Dates

Created:: 16/Jul/18 01:22

Updated:: 27/Sep/24 01:19