Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11306

Add a bloom-1 filter for Hybrid MapJoin spills

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: Hive
    • Labels:
      None
    • Release Note:
      Add a bloom-1 filter to reduce Hybrid MapJoin spills

      Description

      HIVE-9277 implemented Spillable joins for Tez, which suffers from a corner-case performance issue when joining wide small tables against a narrow big table (like a user info table join events stream).

      The fact that the wide table is spilled causes extra IO, even though the nDV of the join key might be in the thousands.

      A cheap bloom-1 filter would add a massive performance gain for such queries, massively cutting down on the spill IO costs for the big-table spills.

        Attachments

        1. HIVE-11306.6.patch
          7 kB
          Wei Zheng
        2. HIVE-11306.5.patch
          7 kB
          Wei Zheng
        3. HIVE-11306.3.patch
          5 kB
          Wei Zheng
        4. HIVE-11306.2.patch
          5 kB
          Gopal V
        5. HIVE-11306.1.patch
          4 kB
          Gopal V

          Issue Links

            Activity

              People

              • Assignee:
                wzheng Wei Zheng
                Reporter:
                gopalv Gopal V
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: