Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0
-
None
Description
If the user changes the shuffle partition number between batches, Streaming aggregation will fail.
Here are some possible cases:
- Change "spark.sql.shuffle.partitions"
- Use "repartition" and change the partition number in codes
- RangePartitioner doesn't generate deterministic partitions. Right now it's safe as we disallow sort before aggregation. Not sure if we will add some operators using RangePartitioner in future.
Fix:
Record # shuffle partitions in offset log and enforce in next batch
Attachments
Issue Links
- requires
-
SPARK-19540 Add ability to clone SparkSession with an identical copy of the SessionState
- Resolved
- links to