[IMPALA-1159] fnv_hash UDF initialized with 32 bits offset basis - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: Impala 1.4
Fix Version/s: None
Component/s: Backend
Labels:
Environment:
Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Target Version:

Product Backlog

Description

According to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
the fnv_hash UDF implements the 64 bits FNV-1a variation.

According to http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
the algorithm should be seeded with the 64-bit FNV offset basis value: 14695981039346656037 (in hex, 0xcbf29ce484222325)

Implementing this, I did not obtain the same FNV 1a hashes as Impala
E.g. with impala-shell I obtain

+---------------------+
| fnv_hash('hello')   |
+---------------------+
| 6414202926103426347 |
+---------------------+

whereas it should be -6615550055289275125

By looking at the Impala unit tests:

https://github.com/cloudera/Impala/blob/8567b51f8c38bd389a338c761242a316d8ffe5c8/be/src/exprs/expr-test.cc

Excerpt:

// Test fnv_hash
string s("hello world");
uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), HashUtil::FNV_SEED);
TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);

I see that the algorithm is seeded with the 32 bits offset basis
instead of FNV64_SEED.

If I update my algorithm and seed it with the 32 bits offset basis, I obtain the same hashes as impala.

For backward compatibility, it may not be easy to fix. Or it could be deprecated and replaced with a fixed UDF ?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Thierry Herrmann

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 16/Aug/14 22:49

Updated:: 30/Oct/18 17:23

Resolved:: 30/Oct/18 17:23