Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21274

Implement EXCEPT ALL and INTERSECT ALL

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0, 2.2.0
    • 2.4.0
    • SQL
    • None

    Description

      1) EXCEPT ALL / MINUS ALL :

      SELECT a,b,c FROM tab1
       EXCEPT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following outer join:

      SELECT a,b,c
      FROM    tab1 t1
           LEFT OUTER JOIN 
              tab2 t2
           ON (
              (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
           )
      WHERE
          COALESCE(t2.a, t2.b, t2.c) IS NULL
      

      (register as a temp.view this second query under "t1_except_t2_df" name that can be also used to find INTERSECT ALL below):

      2) INTERSECT ALL:

      SELECT a,b,c FROM tab1
       INTERSECT ALL 
      SELECT a,b,c FROM tab2
      

      can be rewritten as following anti-join using t1_except_t2_df we defined above:

      SELECT a,b,c
      FROM    tab1 t1
      WHERE 
         NOT EXISTS
         (    SELECT 1
              FROM    t1_except_t2_df e
              WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
         )
      

      So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dkbiswal Dilip Biswal
            Tagar Ruslan Dautkhanov
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment