[SPARK-24874] Allow hybrid of both barrier tasks and regular tasks in a stage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Epic Link:
Support Barrier Execution Mode

Description

Currently we only allow barrier tasks in a barrier stage, however, consider the following query:

sc = new SparkContext(conf)
val rdd1 = sc.parallelize(1 to 100, 10)
val rdd2 = sc.parallelize(1 to 1000, 20).barrier().mapPartitions((it, ctx) => it)
val rdd = rdd1.union(rdd2).mapPartitions(t => t)

Now it requires 30 free slots to run `rdd.collect()`. Actually, we can launch regular tasks to collect data from rdd1's partitions, they are not required to be launched together. If we can do that, we only need 20 free slots to run `rdd.collect()`.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Xingbo Jiang

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Jul/18 12:31

Updated:: 16/Mar/20 22:52