Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10673

Dynamically partitioned hash join for Tez

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Release Note:
      This adds configuration parameter hive.optimize.dynamic.partition.hashjoin, which enables selection of the dynamically partitioned hash join with the Tez execution engine


      Some analysis of shuffle join queries by Mostafa Mokhtar/Gopal Vijayaraghavan found about 2/3 of the CPU was spent during sorting/merging.
      While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work.


        1. HIVE-10673.1.patch
          189 kB
          Jason Dere
        2. HIVE-10673.10.patch
          246 kB
          Jason Dere
        3. HIVE-10673.11.patch
          233 kB
          Jason Dere
        4. HIVE-10673.12
          232 kB
          Jason Dere
        5. HIVE-10673.2.patch
          190 kB
          Jason Dere
        6. HIVE-10673.3.patch
          191 kB
          Jason Dere
        7. HIVE-10673.4.patch
          191 kB
          Jason Dere
        8. HIVE-10673.5.patch
          191 kB
          Jason Dere
        9. HIVE-10673.6.patch
          193 kB
          Jason Dere
        10. HIVE-10673.7.patch
          237 kB
          Jason Dere
        11. HIVE-10673.8.patch
          239 kB
          Jason Dere
        12. HIVE-10673.9.patch
          239 kB
          Jason Dere

        Issue Links


          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users



              • Created:

                Issue deployment