Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10673

Dynamically partitioned hash join for Tez

    XMLWordPrintableJSON

Details

    • This adds configuration parameter hive.optimize.dynamic.partition.hashjoin, which enables selection of the dynamically partitioned hash join with the Tez execution engine

    Description

      Some analysis of shuffle join queries by mmokhtar/gopalv found about 2/3 of the CPU was spent during sorting/merging.
      While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work.

      Attachments

        1. HIVE-10673.1.patch
          189 kB
          Jason Dere
        2. HIVE-10673.10.patch
          246 kB
          Jason Dere
        3. HIVE-10673.11.patch
          233 kB
          Jason Dere
        4. HIVE-10673.12
          232 kB
          Jason Dere
        5. HIVE-10673.2.patch
          190 kB
          Jason Dere
        6. HIVE-10673.3.patch
          191 kB
          Jason Dere
        7. HIVE-10673.4.patch
          191 kB
          Jason Dere
        8. HIVE-10673.5.patch
          191 kB
          Jason Dere
        9. HIVE-10673.6.patch
          193 kB
          Jason Dere
        10. HIVE-10673.7.patch
          237 kB
          Jason Dere
        11. HIVE-10673.8.patch
          239 kB
          Jason Dere
        12. HIVE-10673.9.patch
          239 kB
          Jason Dere

        Issue Links

          Activity

            People

              jdere Jason Dere
              jdere Jason Dere
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: