Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2707

Add FindOrInsert method to hash table to avoid unnecessary probe in aggregation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.3.0
    • Impala 2.5.0
    • None

    Description

      For each input row the aggregation node uses HashTable::Find() followed by HashTable::Insert() if the grouping key isn't already present in the table. Both of these methods probe the hash table to find the same bucket. If we added a FindOrInsert() method to the hash table that returned a modifiable iterator pointing to the bucket, we could save a significant number of hash table probes.

      There is already a TODO in the partitioned-aggregation-node-ir.cc code for this, so I'm creating a JIRA to track the issue.

      This could speed up aggregations with large output size significantly, e.g. TPC-H query 13 (see IMPALA-2470).

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: