[IMPALA-10595] Consider using a batched DuplicateNode for the hash table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: Impala 4.0.0
Fix Version/s: None
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-13

Description

hash-table.h's DuplicateNode structure is currently 24 bytes:

  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the DuplicateNode has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information.
    /// TODO: Fold this flag in the next pointer below.
    bool matched;

    /// padding to an 8 byte boundary

    /// Chain to next duplicate node, NULL when end of list.
    DuplicateNode* next;
    HtData htdata;
  };

The matched boolean occupies 8 bytes due to the alignment of next and htdata. Apart from the space usage, this linked list is also not very cache friendly during iteration. Multiple DuplicateNodes are allocated from a single chunk of memory, so it is possible that there is some locality by happenstance, but there are no guarantees. One way to make this more cache friendly is to batch multiple duplicates into a single structure. For example, consider this structure:

  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the DuplicateNode has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information.
    /// TODO: Fold this flag in the next pointer below.
    bool matched[BATCH_SIZE];

    HtData htdata[BATCH_SIZE];
    /// Chain to next duplicate node, NULL when end of list.
    DuplicateNode* next;
  };

This amortizes the matched boolean and the next pointer over multiple data pointers, reducing the overhead. If BATCH_SIZE exceeded 8, the matched could be packed further as individual bits on a single field if needed.

BATCH_SIZE=2 => 32 bytes

BATCH_SIZE=3 => 40 bytes

BATCH_SIZE=4 => 48 bytes

Num Duplicates	BATCH_SIZE=1	BATCH_SIZE=2	BATCH_SIZE=3	BATCH_SIZE=4	BATCH_SIZE=5	BATCH_SIZE=6
1	0 bytes	0 bytes	0 bytes	0 bytes	0 bytes	0 bytes
2	48 bytes	32 bytes	40 bytes	48 bytes	56 bytes	64 bytes
3	72 bytes	64 bytes	40 bytes	48 bytes	56 bytes	64 bytes
4	96 bytes	64 bytes	80 bytes	48 bytes	56 bytes	64 bytes
5	120 bytes	96 bytes	80 bytes	96 bytes	56 bytes	64 bytes
6	144 bytes	96 bytes	80 bytes	96 bytes	112 bytes	64 bytes
7	168 bytes	128 bytes	120 bytes	96 bytes	112 bytes	128 bytes
8	192 bytes	128 bytes	120 bytes	96 bytes	112 bytes	128 bytes
9	216 bytes	160 bytes	120 bytes	144 bytes	112 bytes	128 bytes
10	240 bytes	160 bytes	160 bytes	144 bytes	112 bytes	128 bytes
11	264 bytes	196 bytes	160 bytes	144 bytes	168 bytes	128 bytes
12	288 bytes	196 bytes	160 bytes	144 bytes	168 bytes	128 bytes

It seems like BATCH_SIZE=3 or 4 could make the duplicates code more cache friendly without regressing memory usage or requiring any complicated logic. BATCH_SIZE=6 corresponds to 64 bytes and a single cache line.

Of course, there are also ways to make the logic more complicated (e.g. use a certain batch size for the first two or three and then switch to larger ones).

Attachments

Issue Links

is related to

IMPALA-7635 Reduce size of hash tables in-memory by packing buckets more densely

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Joe McDonnell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Mar/21 22:13

Updated:: 18/Mar/21 23:51