Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.13
-
None
Description
The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or 3 that are negative are not masked to unsigned leading to an error.
This test passes using data generated from the Python mmh3 library which calls the MurmurHash3 c++ code (modified for Python):
/** * Test to demonstrate the errors in * {@link MurmurHash3#hash32(byte[], int, int, int)} * if the final 1, 2, or 3 bytes are negative. */ @Test public void testHash32With1TrailingSignedByteIsInvalid() { // Generate test data: // import mmh3 // import numpy as np // mmh3.hash(np.uint8([-1])) // mmh3.hash(np.uint8([0, -1])) // mmh3.hash(np.uint8([0, 0, -1])) // mmh3.hash(np.uint8([-1, 0])) // mmh3.hash(np.uint8([-1, 0, 0])) // mmh3.hash(np.uint8([0, -1, 0])) Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, 1, 0)); Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1}, 0, 1, 0)); Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, -1}, 0, 1, 0)); Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, 0}, 0, 1, 0)); Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0, 0}, 0, 1, 0)); Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1, 0}, 0, 1, 0)); }
This test passes with assertEquals when the code is fixed to apply masking to the final 3 bytes:
case 3: k1 ^= (data[index + 2] & 0xff) << 16; case 2: k1 ^= (data[index + 1] & 0xff) << 8; case 1: k1 ^= (data[index] & 0xff);
Fixing this error will be a behavioural change.
It is recommended to leave this method alone and implement a new hash32x86 method that should match the MurmurHash3_x86_32 method from the c++ source code.
Attachments
Issue Links
- is related to
-
CODEC-264 murmur3.hash128() does not account for unsigned seed argument
- Resolved