Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4570

Add broadcast join to left semi join

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.1.0
    • 1.3.0
    • SQL
    • None

    Description

      For now, spark use broadcast join instead of hash join to optimize inner join when the size of one side data did not reach the AUTO_BROADCASTJOIN_THRESHOLD
      However,Spark SQL will perform shuffle operations on each child relations while executing left semi join is more suitable for optimiztion with broadcast join.
      We are planning to create a BroadcastLeftSemiJoinHash to implement the broadcast join for left semi join.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            wangxj8 XiaoJing wang
            wangxj8 XiaoJing wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment