Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1159

fnv_hash UDF initialized with 32 bits offset basis

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: Impala 1.4
    • Fix Version/s: None
    • Component/s: Backend
    • Environment:
      Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

      Description

      According to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
      the fnv_hash UDF implements the 64 bits FNV-1a variation.

      According to http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
      the algorithm should be seeded with the 64-bit FNV offset basis value: 14695981039346656037 (in hex, 0xcbf29ce484222325)

      Implementing this, I did not obtain the same FNV 1a hashes as Impala
      E.g. with impala-shell I obtain

      +---------------------+
      | fnv_hash('hello')   |
      +---------------------+
      | 6414202926103426347 |
      +---------------------+
      

      whereas it should be -6615550055289275125

      By looking at the Impala unit tests:

      https://github.com/cloudera/Impala/blob/8567b51f8c38bd389a338c761242a316d8ffe5c8/be/src/exprs/expr-test.cc

      Excerpt:

      // Test fnv_hash
      string s("hello world");
      uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), HashUtil::FNV_SEED);
      TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);
      

      I see that the algorithm is seeded with the 32 bits offset basis
      instead of FNV64_SEED.

      If I update my algorithm and seed it with the 32 bits offset basis, I obtain the same hashes as impala.

      For backward compatibility, it may not be easy to fix. Or it could be deprecated and replaced with a fixed UDF ?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              thierry.herrmann_impala_6ef2 Thierry Herrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: