Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
3.1.1, 4.0.0
-
None
-
None
Description
The comments in Vectorized hash computation call out the MurmurHash implementation (the one using 0x5bd1e995), while the non-vectorized codepath calls out the Murmur3 one (using 0xcc9e2d51).
The comments here are wrong
/** * Batch compute the hash codes for all the serialized keys. * * NOTE: MAJOR MAJOR ASSUMPTION: * We assume that HashCodeUtil.murmurHash produces the same result * as MurmurHash.hash with seed = 0 (the method used by ReduceSinkOperator for * UNIFORM distribution). */ protected void computeSerializedHashCodes() { int offset = 0; int keyLength; byte[] bytes = output.getData(); for (int i = 0; i < nonNullKeyCount; i++) { keyLength = serializedKeyLengths[i]; hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0); offset += keyLength; } }
but the wrong comment is followed in the Vector RS operator
System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, nullBytesLength);
nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, nullBytesLength);