Description
This is an issue very similar to SPARK-22489. When there are no broadcast hints, the current spark strategies will prefer to build right, without considering the sizes of the two sides. To reproduce:
import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec spark.createDataFrame(Seq((1, "4"), (2, "2"))).toDF("key", "value").createTempView("table1") spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "value").createTempView("table2") val bl = sql(s"SELECT * FROM table1 t1 JOIN table2 t2 ON t1.key = t2.key").queryExecution.executedPlan
The plan is going to broadcast right side (`t2`), even though it is larger.