Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34514

Push down limit for LEFT SEMI and LEFT ANTI join

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 3.1.0
    • 3.2.0
    • SQL
    • None

    Description

      I found out during code review of https://github.com/apache/spark/pull/31567https://github.com/apache/spark/pull/31567#discussion_r577379572 ), where we can push down limit to the left side of LEFT SEMI and LEFT ANTI join, if the join condition is empty.

      Why it's safe to push down limit:

      The semantics of LEFT SEMI join without condition:

      (1). if right side is non-empty, output all rows from left side.

      (2). if right side is empty, output nothing.

       

      The semantics of LEFT ANTI join without condition:

      (1). if right side is non-empty, output nothing.

      (2). if right side is empty, output all rows from left side.

       

      With the semantics of output all rows from left side or nothing (all or nothing), it's safe to push down limit to left side.

      NOTE: LEFT SEMI / LEFT ANTI join with non-empty condition is not safe for limit push down, because output can be a portion of left side rows.

       

      Physical operator for LEFT SEMI / LEFT ANTI join without condition - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L200-L204 .

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chengsu Cheng Su
            chengsu Cheng Su
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment