Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24874

Allow hybrid of both barrier tasks and regular tasks in a stage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Spark Core
    • None

    Description

      Currently we only allow barrier tasks in a barrier stage, however, consider the following query:

      sc = new SparkContext(conf)
      val rdd1 = sc.parallelize(1 to 100, 10)
      val rdd2 = sc.parallelize(1 to 1000, 20).barrier().mapPartitions((it, ctx) => it)
      val rdd = rdd1.union(rdd2).mapPartitions(t => t)
      

      Now it requires 30 free slots to run `rdd.collect()`. Actually, we can launch regular tasks to collect data from rdd1's partitions, they are not required to be launched together. If we can do that, we only need 20 free slots to run `rdd.collect()`.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jiangxb1987 Xingbo Jiang
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: