Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.2.0
-
None
Description
Note the different number of partitions in range in SQL and as operator.
scala> spark.range(4).explain == Physical Plan == *Range (0, 4, step=1, splits=Some(8)) // <-- note Some(8) scala> sql("select * from range(4)").explain == Physical Plan == *Range (0, 4, step=1, splits=None) // <-- note None
If I'm not mistaken, the change is to fix builtinFunctions in ResolveTableValuedFunctions (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala#L82-L93) to use sparkContext.defaultParallelism as SparkSession.range (see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L517).
Please confirm to work on a fix if and as needed.