[IMPALA-7635] Reduce size of hash tables in-memory by packing buckets more densely - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 3.1.0
Fix Version/s: Impala 4.1.0
Component/s: Backend
Labels:
- perf

Target Version:

Product Backlog
Epic Color:
ghx-label-4

Description

Currently the hash tables used for hash join and aggregation use 16 bytes per bucket and 24 bytes per additional duplicate for a key:

  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the DuplicateNode has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information.
    /// TODO: Fold this flag in the next pointer below.
    bool matched;

    /// Chain to next duplicate node, NULL when end of list.
    DuplicateNode* next;
    HtData htdata;
  };

  struct Bucket {
    /// Whether this bucket contains a vaild entry, or it is empty.
    bool filled;

    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the bucket has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information but it is efficient. This space is otherwise unused.
    bool matched;

    /// Used in case of duplicates. If true, then the bucketData union should be used as
    /// 'duplicates'.
    bool hasDuplicates;

    /// Cache of the hash for data.
    /// TODO: Do we even have to cache the hash value?
    uint32_t hash;

    /// Either the data for this bucket or the linked list of duplicates.
    union {
      HtData htdata;
      DuplicateNode* duplicates;
    } bucketData;
  };

There are some comments in the code that suggest folding the boolean values into the upper bits of the pointers (since on amd64 the address space is only 48 bits, but moving to 57 bits apparently - see https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf). That would reduce the bucket to 12 bytes of actual data.

This would give us the opportunity to reduce memory requirements of joins and the pressure on caches significantly, provided we can work out the implementation issues and the cost of the bit manipulation doesn't exceed the benefit (my intuition is that cache effects are way more important but I could be wrong).

Here's a rough idea of what we could do:

Implement folding of booleans into the pointer and mark struct Bucket as packed so that it doesn't just undo the work with additional padding.
Modifying Hashtable to work with the new bucket structure. This needs a little thought since the bucket allocations must be a power-of-two size in bytes, but we also need the hash table entries to be a power-of-two in order for masking the hash to get the bucket number to work. I think either we could just leave wasted space in the buffer or switch to a non-power-of-two number of buckets and using an alternative method of getting the bucket from the hash: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Run benchmarks to see if it's beneficial. The effect probably depends on the data set size.

Attachments

Issue Links

relates to

IMPALA-10595 Consider using a batched DuplicateNode for the hash table

Open

Activity

People

Assignee:: Amogh Margoor

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 26/Sep/18 21:30

Updated:: 17/Apr/22 12:14

Resolved:: 28/Oct/21 08:35