[SPARK-20281] Table-valued function range in SQL should use the same number of partitions as spark.range - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

Note the different number of partitions in range in SQL and as operator.

scala> spark.range(4).explain
== Physical Plan ==
*Range (0, 4, step=1, splits=Some(8)) // <-- note Some(8)

scala> sql("select * from range(4)").explain
== Physical Plan ==
*Range (0, 4, step=1, splits=None) // <-- note None

If I'm not mistaken, the change is to fix builtinFunctions in ResolveTableValuedFunctions (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala#L82-L93) to use sparkContext.defaultParallelism as SparkSession.range (see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L517).

Please confirm to work on a fix if and as needed.

Attachments

Issue Links

links to

[Github] Pull Request #17670 (maropu)

Activity

People

Assignee:: Unassigned

Reporter:: Jacek Laskowski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Apr/17 12:11

Updated:: 21/May/19 04:16

Resolved:: 21/May/19 04:16