[SPARK-31841] Dataset.repartition leverage adaptive execution - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

spark branch-3.0 from may 1 this year

Description

hello,

we are very happy users of adaptive query execution. its a great feature to now have to think about and tune the number of partitions anymore in a shuffle.

i noticed that Dataset.groupBy consistently uses adaptive execution when its enabled (e.g. i don't see the default 200 partitions) but when i do Dataset.repartition it seems i am back to a hardcoded number of partitions.

Should adaptive execution also be used for repartition? It would be nice to be able to repartition without having to think about optimal number of partitions.

An example:

$ spark-shell --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.advisoryPartitionSizeInBytes=100000
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-SNAPSHOT
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_252)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val x = (1 to 1000000).toDF
x: org.apache.spark.sql.DataFrame = [value: int]
scala> x.rdd.getNumPartitions
res0: Int = 2scala> x.repartition($"value").rdd.getNumPartitions
res1: Int = 200
scala> x.groupBy("value").count.rdd.getNumPartitions
res2: Int = 67

Attachments

Issue Links

duplicates

SPARK-31220 repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: koert kuipers

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/May/20 21:15

Updated:: 29/May/20 12:33

Resolved:: 29/May/20 12:33