Description
The main idea here is that make join config isolation between normal planner and aqe planner which shared the same code path.
Actually we don not very trust using the static stat to consider if it can build broadcast hash join. In our experience it's very common that Spark throw broadcast timeout or driver side OOM exception when execute a bit large plan. And due to braodcast join is not reversed which means if we covert join to braodcast hash join at first time, we(AQE) can not optimize it again, so it should make sense to decide if we can do broadcast at aqe side using different sql config.
In order to achieve this we use a specific join hint in advance during AQE framework and then at JoinSelection side it will take and follow the inserted hint.
For now we only support select strategy for equi join, and follow this order
1. mark join as broadcast hash join if possible
2. mark join as shuffled hash join if possible
Note that, we don't override join strategy if user specifies a join hint.
Attachments
Issue Links
- duplicates
-
SPARK-36630 Add the option to use physical statistics to avoid large tables being broadcast
- Closed
- links to