Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11306

Add a bloom-1 filter for Hybrid MapJoin spills

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0, 2.0.0
    • 2.0.0
    • Hive
    • None
    • Add a bloom-1 filter to reduce Hybrid MapJoin spills

    Description

      HIVE-9277 implemented Spillable joins for Tez, which suffers from a corner-case performance issue when joining wide small tables against a narrow big table (like a user info table join events stream).

      The fact that the wide table is spilled causes extra IO, even though the nDV of the join key might be in the thousands.

      A cheap bloom-1 filter would add a massive performance gain for such queries, massively cutting down on the spill IO costs for the big-table spills.

      Attachments

        1. HIVE-11306.6.patch
          7 kB
          Wei Zheng
        2. HIVE-11306.5.patch
          7 kB
          Wei Zheng
        3. HIVE-11306.3.patch
          5 kB
          Wei Zheng
        4. HIVE-11306.2.patch
          5 kB
          Gopal Vijayaraghavan
        5. HIVE-11306.1.patch
          4 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              wzheng Wei Zheng
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: