Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4570

Add broadcast join to left semi join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.1.0
    • 1.3.0
    • SQL
    • None

    Description

      For now, spark use broadcast join instead of hash join to optimize inner join when the size of one side data did not reach the AUTO_BROADCASTJOIN_THRESHOLD
      However,Spark SQL will perform shuffle operations on each child relations while executing left semi join is more suitable for optimiztion with broadcast join.
      We are planning to create a BroadcastLeftSemiJoinHash to implement the broadcast join for left semi join.

      Attachments

        Activity

          People

            wangxj8 XiaoJing wang
            wangxj8 XiaoJing wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: