Description
For now, spark use broadcast join instead of hash join to optimize inner join when the size of one side data did not reach the AUTO_BROADCASTJOIN_THRESHOLD
However,Spark SQL will perform shuffle operations on each child relations while executing left semi join is more suitable for optimiztion with broadcast join.
We are planning to create a BroadcastLeftSemiJoinHash to implement the broadcast join for left semi join.