[SPARK-11914] [SQL] Support coalesce and repartition in Dataset APIs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.6.0
Component/s: SQL
Labels:
None

Description

repartition: Returns a new [[Dataset]] that has exactly `numPartitions` partitions.

coalesce: Returns a new [[Dataset]] that has exactly `numPartitions` partitions. Similar to coalesce defined on an [[RDD]], this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

Attachments

Issue Links

links to

[Github] Pull Request #9899 (gatorsmile)

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Nov/15 03:44

Updated:: 03/Nov/16 19:36

Resolved:: 24/Nov/15 23:54