[SPARK-24941] Add RDDBarrier.coalesce() function - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Epic Link:
Support Barrier Execution Mode

Description

https://github.com/apache/spark/pull/21758#discussion_r204917245

The number of partitions from the input data can be unexpectedly large, eg. if you do

sc.textFile(...).barrier().mapPartitions()

The number of input partitions is based on the hdfs input splits. We shall provide a way in RDDBarrier to enable users to specify the number of tasks in a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) .

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Xingbo Jiang

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 27/Jul/18 03:17

Updated:: 25/Apr/24 19:22