Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32461 Shuffled hash join improvement
  3. SPARK-32399

Support full outer join in shuffled hash join

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • SQL
    • None

    Description

      Currently for SQL full outer join, spark always does a sort merge join no matter of how large the join children size are. Inspired by recent discussion in https://github.com/apache/spark/pull/29130#discussion_r456502678 and https://github.com/apache/spark/pull/29181, I think we can support full outer join in shuffled hash join in a way that - when looking up stream side keys from build side HashedRelation. Mark this info inside build side HashedRelation, and after reading all rows from stream side, output all non-matching rows from build side based on modified HashedRelation.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chengsu Cheng Su
            chengsu Cheng Su
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment