Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0, 3.2.0
-
None
Description
Shuffled hash join avoids sort compared to sort merge join. This advantage shows up obviously when joining large table in terms of saving CPU and IO (in case of external sort happens). In latest master trunk, shuffled hash join is disabled by default with config "spark.sql.join.preferSortMergeJoin"=true, with favor of reducing risk of OOM. However shuffled hash join could be improved to a better stateĀ (validated in our internal fork). Creating this Jira to track overall progress.