Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10595

Consider using a batched DuplicateNode for the hash table

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.0.0
    • None
    • Backend
    • None
    • ghx-label-13

    Description

      hash-table.h's DuplicateNode structure is currently 24 bytes:

       

        /// Linked list of entries used for duplicates.
        struct DuplicateNode {
          /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
          /// row in the DuplicateNode has been matched.
          /// From an abstraction point of view, this is an awkward place to store this
          /// information.
          /// TODO: Fold this flag in the next pointer below.
          bool matched;
      
          /// padding to an 8 byte boundary
      
          /// Chain to next duplicate node, NULL when end of list.
          DuplicateNode* next;
          HtData htdata;
        };

      The matched boolean occupies 8 bytes due to the alignment of next and htdata. Apart from the space usage, this linked list is also not very cache friendly during iteration. Multiple DuplicateNodes are allocated from a single chunk of memory, so it is possible that there is some locality by happenstance, but there are no guarantees. One way to make this more cache friendly is to batch multiple duplicates into a single structure. For example, consider this structure:

       

        /// Linked list of entries used for duplicates.
        struct DuplicateNode {
          /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
          /// row in the DuplicateNode has been matched.
          /// From an abstraction point of view, this is an awkward place to store this
          /// information.
          /// TODO: Fold this flag in the next pointer below.
          bool matched[BATCH_SIZE];
      
          HtData htdata[BATCH_SIZE];
          /// Chain to next duplicate node, NULL when end of list.
          DuplicateNode* next;
        };

      This amortizes the matched boolean and the next pointer over multiple data pointers, reducing the overhead. If BATCH_SIZE exceeded 8, the matched could be packed further as individual bits on a single field if needed.

      BATCH_SIZE=2 => 32 bytes

      BATCH_SIZE=3 => 40 bytes

      BATCH_SIZE=4 => 48 bytes

      Num Duplicates BATCH_SIZE=1 BATCH_SIZE=2 BATCH_SIZE=3 BATCH_SIZE=4 BATCH_SIZE=5 BATCH_SIZE=6
      1 0 bytes 0 bytes 0 bytes 0 bytes 0 bytes 0 bytes
      2 48 bytes 32 bytes 40 bytes 48 bytes 56 bytes 64 bytes
      3 72 bytes 64 bytes 40 bytes 48 bytes 56 bytes 64 bytes
      4 96 bytes 64 bytes 80 bytes 48 bytes 56 bytes 64 bytes
      5 120 bytes 96 bytes 80 bytes 96 bytes 56 bytes 64 bytes
      6 144 bytes 96 bytes 80 bytes 96 bytes 112 bytes 64 bytes
      7 168 bytes 128 bytes 120 bytes 96 bytes 112 bytes 128 bytes
      8 192 bytes 128 bytes 120 bytes 96 bytes 112 bytes 128 bytes
      9 216 bytes 160 bytes 120 bytes 144 bytes 112 bytes 128 bytes
      10 240 bytes 160 bytes 160 bytes 144 bytes 112 bytes 128 bytes
      11 264 bytes 196 bytes 160 bytes 144 bytes 168 bytes 128 bytes
      12 288 bytes 196 bytes 160 bytes 144 bytes 168 bytes 128 bytes

      It seems like BATCH_SIZE=3 or 4 could make the duplicates code more cache friendly without regressing memory usage or requiring any complicated logic. BATCH_SIZE=6 corresponds to 64 bytes and a single cache line.

      Of course, there are also ways to make the logic more complicated (e.g. use a certain batch size for the first two or three and then switch to larger ones).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              joemcdonnell Joe McDonnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: