Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
This adds configuration parameter hive.optimize.dynamic.partition.hashjoin, which enables selection of the dynamically partitioned hash join with the Tez execution engine
Description
Some analysis of shuffle join queries by mmokhtar/gopalv found about 2/3 of the CPU was spent during sorting/merging.
While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-20839 "Cannot find field" error during dynamically partitioned hash join
- Closed
- links to