[SPARK-19873] If the user changes the number of shuffle partitions between batches, Streaming aggregation will fail. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: Structured Streaming
Labels:
None

Description

If the user changes the shuffle partition number between batches, Streaming aggregation will fail.

Here are some possible cases:

Change "spark.sql.shuffle.partitions"
Use "repartition" and change the partition number in codes
RangePartitioner doesn't generate deterministic partitions. Right now it's safe as we disallow sort before aggregation. Not sure if we will add some operators using RangePartitioner in future.

Fix:
Record # shuffle partitions in offset log and enforce in next batch

Attachments

Issue Links

requires

SPARK-19540 Add ability to clone SparkSession with an identical copy of the SessionState

Resolved

links to

[Github] Pull Request #17216 (kunalkhamar)

Activity

People

Assignee:: Unassigned

Reporter:: Kunal Khamar

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Mar/17 21:12

Updated:: 17/Mar/17 23:14

Resolved:: 17/Mar/17 23:14