Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2270

FNV hash does not handle empty strings correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 1.3, Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
    • Impala 2.3.0, Impala 2.2.8
    • None

    Description

      The Fnv64to32 hash function is pathologically bad when hashing empty strings. This hash function is used to partition data across nodes and can result in all the data being sent to one node.

      The code in question is

      static uint32_t FnvHash64to32(const void* data, int32_t bytes, uint32_t hash) {
          uint64_t hash_u64 = hash | ((uint64_t)hash << 32);
          hash_u64 = FnvHash64(data, bytes, hash_u64);
          return (hash_u64 >> 32) ^ (hash_u64 & 0xFFFFFFFF);
      }
      

      For an empty string (bytes == 0), FnvHash64 will return the original seed (hash_u64). This degenerates the folding logic to xoring the seed by itself and always returning 0.

      The result of this behavior is that any hash partition of the from: hash (...., empty col) will result in the computed hash being 0. The last column wipes out whatever hash was generated by the prior columns. There is some special logic to handle NULLs (RawValue::HashCombine32) which can probably be used for this case as well.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            nong_impala_60e1 Nong Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: