Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11914

[SQL] Support coalesce and repartition in Dataset APIs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.0
    • SQL
    • None

    Description

      repartition: Returns a new [[Dataset]] that has exactly `numPartitions` partitions.

      coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` partitions. Similar to coalesce defined on an [[RDD]], this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

      Attachments

        Activity

          People

            smilegator Xiao Li
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: