Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43339

LEFT JOIN is treated as INNER JOIN when being in a middle of double join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 3.4.0
    • None
    • Optimizer
    • None

    Description

      Consider query like

       

      SELECT ss_item_sk
             FROM   store_sales
                    LEFT OUTER JOIN store_returns
                                 ON ( sr_item_sk = ss_item_sk ),
                    reason
             WHERE  sr_reason_sk = r_reason_sk
                    AND r_reason_desc = 'reason 38'

       

      Spark generates following plan:

       

      AdaptiveSparkPlan isFinalPlan=false
      +- Project [ss_item_sk#2]
         +- BroadcastHashJoin [sr_reason_sk#458], [r_reason_sk#734], Inner, BuildRight, false
            :- Project [ss_item_sk#2, sr_reason_sk#458]
            :  +- BroadcastHashJoin [ss_item_sk#2], [sr_item_sk#452], Inner, BuildRight, false
            :     :- FileScan parquet [ss_item_sk#2] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_sales], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ss_item_sk:int>
            :     +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#7227]
            :        +- Filter (isnotnull(sr_item_sk#452) AND isnotnull(sr_reason_sk#458))
            :           +- FileScan parquet [sr_item_sk#452,sr_reason_sk#458] Batched: true, DataFilters: [isnotnull(sr_item_sk#452), isnotnull(sr_reason_sk#458)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_returns], PartitionFilters: [], PushedFilters: [IsNotNull(sr_item_sk), IsNotNull(sr_reason_sk)], ReadSchema: struct<sr_item_sk:int,sr_reason_sk:int>
            +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#7231]
               +- Project [r_reason_sk#734]
                  +- Filter ((isnotnull(r_reason_desc#736) AND (r_reason_desc#736 = reason 38)) AND isnotnull(r_reason_sk#734))
                     +- FileScan parquet [r_reason_sk#734,r_reason_desc#736] Batched: true, DataFilters: [isnotnull(r_reason_desc#736), (r_reason_desc#736 = reason 38), isnotnull(r_reason_sk#734)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/leonid/tpcds-spark-data-no-padding/reason], PartitionFilters: [], PushedFilters: [IsNotNull(r_reason_desc), EqualTo(r_reason_desc,reason 38), IsNotNull(r_reason_sk)], ReadSchema: struct<r_reason_sk:int,r_reason_desc:string>
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lchistov1987 Leonid Chistov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: