Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
Impala 1.4
-
None
-
Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Description
According to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
the fnv_hash UDF implements the 64 bits FNV-1a variation.
According to http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
the algorithm should be seeded with the 64-bit FNV offset basis value: 14695981039346656037 (in hex, 0xcbf29ce484222325)
Implementing this, I did not obtain the same FNV 1a hashes as Impala
E.g. with impala-shell I obtain
+---------------------+
| fnv_hash('hello') |
+---------------------+
| 6414202926103426347 |
+---------------------+
whereas it should be -6615550055289275125
By looking at the Impala unit tests:
Excerpt:
// Test fnv_hash string s("hello world"); uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), HashUtil::FNV_SEED); TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);
I see that the algorithm is seeded with the 32 bits offset basis
instead of FNV64_SEED.
If I update my algorithm and seed it with the 32 bits offset basis, I obtain the same hashes as impala.
For backward compatibility, it may not be easy to fix. Or it could be deprecated and replaced with a fixed UDF ?