Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2270

FNV hash does not handle empty strings correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 1.3, Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
    • Fix Version/s: Impala 2.3.0, Impala 2.2.8
    • Component/s: None
    • Labels:

      Description

      The Fnv64to32 hash function is pathologically bad when hashing empty strings. This hash function is used to partition data across nodes and can result in all the data being sent to one node.

      The code in question is

      static uint32_t FnvHash64to32(const void* data, int32_t bytes, uint32_t hash) {
          uint64_t hash_u64 = hash | ((uint64_t)hash << 32);
          hash_u64 = FnvHash64(data, bytes, hash_u64);
          return (hash_u64 >> 32) ^ (hash_u64 & 0xFFFFFFFF);
      }
      

      For an empty string (bytes == 0), FnvHash64 will return the original seed (hash_u64). This degenerates the folding logic to xoring the seed by itself and always returning 0.

      The result of this behavior is that any hash partition of the from: hash (...., empty col) will result in the computed hash being 0. The last column wipes out whatever hash was generated by the prior columns. There is some special logic to handle NULLs (RawValue::HashCombine32) which can probably be used for this case as well.

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              nong_impala_60e1 Nong Li
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: