Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21274

Implement EXCEPT ALL and INTERSECT ALL

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0, 2.2.0
    • 2.4.0
    • SQL
    • None

    Description

      1) EXCEPT ALL / MINUS ALL :

      SELECT a,b,c FROM tab1
       EXCEPT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following outer join:

      SELECT a,b,c
      FROM    tab1 t1
           LEFT OUTER JOIN 
              tab2 t2
           ON (
              (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
           )
      WHERE
          COALESCE(t2.a, t2.b, t2.c) IS NULL
      

      (register as a temp.view this second query under "t1_except_t2_df" name that can be also used to find INTERSECT ALL below):

      2) INTERSECT ALL:

      SELECT a,b,c FROM tab1
       INTERSECT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following anti-join using t1_except_t2_df we defined above:

      SELECT a,b,c
      FROM    tab1 t1
      WHERE 
         NOT EXISTS
         (    SELECT 1
              FROM    t1_except_t2_df e
              WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
         )
      

      So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations.

      Attachments

        Activity

          People

            dkbiswal Dilip Biswal
            Tagar Ruslan Dautkhanov
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: