Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20983

Vectorization: Scale up small hashtables, when collisions are detected

    XMLWordPrintableJSON

Details

    Description

      Hive's hashtable estimates are getting better with HyperLogLog stats in place, but an accurate estimate does not always result in a low number of collisions.

      The hashtables which contain a very small number of items tend to lose their O(1) lookup performance where there are collisions. Since collisions are easy to detect within the fast hashtable implementation, a rehashing to a higher size will help these small hashtables avoid collisions and go back to O(1) perf.

      Attachments

        1. HIVE-20983.1.patch
          6 kB
          Gopal Vijayaraghavan
        2. HIVE-20983.2.patch
          11 kB
          Mustafa İman
        3. HIVE-20983.3.patch
          11 kB
          Mustafa İman
        4. HIVE-20983.4.patch
          11 kB
          Mustafa İman
        5. HIVE-20983.5.patch
          11 kB
          Mustafa İman

        Issue Links

          Activity

            People

              mustafaiman Mustafa İman
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m