Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24941

Add RDDBarrier.coalesce() function

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      https://github.com/apache/spark/pull/21758#discussion_r204917245

      The number of partitions from the input data can be unexpectedly large, eg. if you do

      sc.textFile(...).barrier().mapPartitions()
      

      The number of input partitions is based on the hdfs input splits. We shall provide a way in RDDBarrier to enable users to specify the number of tasks in a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) .

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jiangxb1987 Xingbo Jiang
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: