Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18245 Improving support for bucketed table
  3. SPARK-15453

FileSourceScanExec to extract `outputOrdering` information

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.1.0
    • SQL
    • None

    Description

      Datasource allows creation of bucketed and sorted tables but performing joins on such tables still does not utilize this metadata to produce optimal query plan.

      As below, the `Exchange` and `Sort` can be avoided if the tables are known to be hashed + sorted on relevant columns.

      == Physical Plan ==
      WholeStageCodegen
      :  +- SortMergeJoin [j#20,k#21,i#22], [j#23,k#24,i#25], Inner, None
      :     :- INPUT
      :     +- INPUT
      :- WholeStageCodegen
      :  :  +- Sort [j#20 ASC,k#21 ASC,i#22 ASC], false, 0
      :  :     +- INPUT
      :  +- Exchange hashpartitioning(j#20, k#21, i#22, 200), None
      :     +- WholeStageCodegen
      :        :  +- Project [j#20,k#21,i#22]
      :        :     +- Filter (isnotnull(k#21) && isnotnull(j#20))
      :        :        +- Scan orc default.table7[j#20,k#21,i#22] Format: ORC, InputPaths: file:/XXXXXXX/table7, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string>
      +- WholeStageCodegen
         :  +- Sort [j#23 ASC,k#24 ASC,i#25 ASC], false, 0
         :     +- INPUT
         +- Exchange hashpartitioning(j#23, k#24, i#25, 200), None
            +- WholeStageCodegen
               :  +- Project [j#23,k#24,i#25]
               :     +- Filter (isnotnull(k#24) && isnotnull(j#23))
               :        +- Scan orc default.table8[j#23,k#24,i#25] Format: ORC, InputPaths: file:/XXXXXXX/table8, PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: struct<j:int,k:string>
      

      Attachments

        Activity

          People

            tejasp Tejas Patil
            tejasp Tejas Patil
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: