Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21274

Implement EXCEPT ALL and INTERSECT ALL

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.1.0, 2.2.0
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      1) EXCEPT ALL / MINUS ALL :

      SELECT a,b,c FROM tab1
       EXCEPT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following outer join:

      SELECT a,b,c
      FROM    tab1 t1
           LEFT OUTER JOIN 
              tab2 t2
           ON (
              (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
           )
      WHERE
          COALESCE(t2.a, t2.b, t2.c) IS NULL
      

      (register as a temp.view this second query under "t1_except_t2_df" name that can be also used to find INTERSECT ALL below):

      2) INTERSECT ALL:

      SELECT a,b,c FROM tab1
       INTERSECT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following anti-join using t1_except_t2_df we defined above:

      SELECT a,b,c
      FROM    tab1 t1
      WHERE 
         NOT EXISTS
         (    SELECT 1
              FROM    t1_except_t2_df e
              WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
         )
      

      So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations.

        Attachments

          Activity

            People

            • Assignee:
              dkbiswal Dilip Biswal
              Reporter:
              Tagar Ruslan Dautkhanov
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: