Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18455 General support for correlated subquery processing
  3. SPARK-19712

EXISTS and Left Semi join do not produce the same plan

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 3.0.0
    • SQL
    • None

    Description

      This problem was found during the development of SPARK-18874.

      The EXISTS form in the following query:

      sql("select * from t1 inner join t2 on t1.t1a=t2.t2a where exists (select 1 from t3 where t1.t1b=t3.t3b)")

      gives the optimized plan below:

      == Optimized Logical Plan ==
      Join Inner, (t1a#7 = t2a#25)
      :- Join LeftSemi, (t1b#8 = t3b#58)
      :  :- Filter isnotnull(t1a#7)
      :  :  +- Relation[t1a#7,t1b#8,t1c#9] parquet
      :  +- Project [1 AS 1#271, t3b#58]
      :     +- Relation[t3a#57,t3b#58,t3c#59] parquet
      +- Filter isnotnull(t2a#25)
         +- Relation[t2a#25,t2b#26,t2c#27] parquet
      

      whereas a semantically equivalent Left Semi join query below:

      sql("select * from t1 inner join t2 on t1.t1a=t2.t2a left semi join t3 on t1.t1b=t3.t3b")

      gives the following optimized plan:

      == Optimized Logical Plan ==
      Join LeftSemi, (t1b#8 = t3b#58)
      :- Join Inner, (t1a#7 = t2a#25)
      :  :- Filter (isnotnull(t1b#8) && isnotnull(t1a#7))
      :  :  +- Relation[t1a#7,t1b#8,t1c#9] parquet
      :  +- Filter isnotnull(t2a#25)
      :     +- Relation[t2a#25,t2b#26,t2c#27] parquet
      +- Project [t3b#58]
         +- Relation[t3a#57,t3b#58,t3c#59] parquet
      

      Attachments

        Activity

          People

            dkbiswal Dilip Biswal
            nsyca Nattavut Sutyanyong
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: