Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20281

Table-valued function range in SQL should use the same number of partitions as spark.range

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.2.0
    • None
    • SQL

    Description

      Note the different number of partitions in range in SQL and as operator.

      scala> spark.range(4).explain
      == Physical Plan ==
      *Range (0, 4, step=1, splits=Some(8)) // <-- note Some(8)
      
      scala> sql("select * from range(4)").explain
      == Physical Plan ==
      *Range (0, 4, step=1, splits=None) // <-- note None
      

      If I'm not mistaken, the change is to fix builtinFunctions in ResolveTableValuedFunctions (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala#L82-L93) to use sparkContext.defaultParallelism as SparkSession.range (see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L517).

      Please confirm to work on a fix if and as needed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jlaskowski Jacek Laskowski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: