Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32059

Nested Schema Pruning not Working in Window Functions

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      Using tables and data structures in `SchemaPruningSuite.scala`

       

      // code placeholder
      case class FullName(first: String, middle: String, last: String)
      case class Company(name: String, address: String)
      case class Employer(id: Int, company: Company)
      case class Contact(
        id: Int,
        name: FullName,
        address: String,
        pets: Int,
        friends: Array[FullName] = Array.empty,
        relatives: Map[String, FullName] = Map.empty,
        employer: Employer = null,
        relations: Map[FullName, String] = Map.empty)
      case class Department(
        depId: Int,
        depName: String,
        contactId: Int,
        employer: Employer)
      

       

      The query to run:

      // code placeholder
      select a.name.first from (select row_number() over (partition by address order by id desc) as __rank, contacts.* from contacts) a where a.name.first = 'A' AND a.__rank = 1
      

       

      The current physical plan:

      // code placeholder
      == Physical Plan ==
      *(3) Project [name#46.first AS first#74]
      +- *(3) Filter (((isnotnull(name#46) AND isnotnull(__rank#71)) AND (name#46.first = A)) AND (__rank#71 = 1))
         +- Window [row_number() windowspecdefinition(address#47, id#45 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS __rank#71], [address#47], [id#45 DESC NULLS LAST]
            +- *(2) Sort [address#47 ASC NULLS FIRST, id#45 DESC NULLS LAST], false, 0
               +- Exchange hashpartitioning(address#47, 5), true, [id=#52]
                  +- *(1) Project [id#45, name#46, address#47]
                     +- FileScan parquet [id#45,name#46,address#47,p#53] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/_c/4r2j33dd14n9ldfc2xqyzs400000gn/T/spark-85d173af-42..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int,name:struct<first:string,middle:string,last:string>,address:string>
      

       

      The desired physical plan:

       

      // code placeholder
      == Physical Plan ==
      *(3) Project [_gen_alias_77#77 AS first#74]
      +- *(3) Filter (((isnotnull(_gen_alias_77#77) AND isnotnull(__rank#71)) AND (_gen_alias_77#77 = A)) AND (__rank#71 = 1))
         +- Window [row_number() windowspecdefinition(address#47, id#45 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS __rank#71], [address#47], [id#45 DESC NULLS LAST]
            +- *(2) Sort [address#47 ASC NULLS FIRST, id#45 DESC NULLS LAST], false, 0
               +- Exchange hashpartitioning(address#47, 5), true, [id=#52]
                  +- *(1) Project [id#45, name#46.first AS _gen_alias_77#77, address#47]
                     +- FileScan parquet [id#45,name#46,address#47,p#53] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/private/var/folders/_c/4r2j33dd14n9ldfc2xqyzs400000gn/T/spark-c64e0b29-d9..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int,name:struct<first:string>,address:string>
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              frankyin-factual Frank Yin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: