Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0, 3.2.0
-
None
Description
Shuffled hash join avoids sort compared to sort merge join. This advantage shows up obviously when joining large table in terms of saving CPU and IO (in case of external sort happens). In latest master trunk, shuffled hash join is disabled by default with config "spark.sql.join.preferSortMergeJoin"=true, with favor of reducing risk of OOM. However shuffled hash join could be improved to a better stateĀ (validated in our internal fork). Creating this Jira to track overall progress.
Attachments
1.
|
Add hash probes metrics for shuffled hash join | In Progress | Unassigned | |
2.
|
Introduce sort-based fallback mechanism for shuffled hash join | In Progress | Unassigned | |
3.
|
Only codegen build side separately for shuffled hash join | Open | Unassigned | |
4.
|
Introduce hybrid join for sort merge join and shuffled hash join in AQE | Open | Unassigned |