Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11964

Avoid hash collision of partition and bucket in HybridHashTable in Blink SQL

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10.0
    • Component/s: Table SQL / Runtime
    • Labels:
      None

      Description

      In HybridHashTable, first select the corresponding partition according to hashCode, and then select the bucket in the partition according to hashCode, using the same hashCode can easily cause hash collision.

      Consider doing some mix to hashCode when choosing bucket.

      Like JDK HashMap, we can just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash >>> 16))

      In some cases, if a lot of conflicts occurred, this will lead to job hang, because hash join will degenerate to nested loop join.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lzljs3620320 Jingsong Lee
                Reporter:
                lzljs3620320 Jingsong Lee
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: