Uploaded image for project: 'Commons Codec'
  1. Commons Codec
  2. CODEC-267

MurmurHash3.hash32() does not process trailing bytes as unsigned

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14
    • None

    Description

      The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or 3 that are negative are not masked to unsigned leading to an error.

      This test passes using data generated from the Python mmh3 library which calls the MurmurHash3 c++ code (modified for Python):

      /**
       * Test to demonstrate the errors in
       * {@link MurmurHash3#hash32(byte[], int, int, int)}
       * if the final 1, 2, or 3 bytes are negative.
       */
      @Test
      public void testHash32With1TrailingSignedByteIsInvalid() {
          // Generate test data:
          // import mmh3
          // import numpy as np
          // mmh3.hash(np.uint8([-1]))
          // mmh3.hash(np.uint8([0, -1]))
          // mmh3.hash(np.uint8([0, 0, -1]))
          // mmh3.hash(np.uint8([-1, 0]))
          // mmh3.hash(np.uint8([-1, 0, 0]))
          // mmh3.hash(np.uint8([0, -1, 0]))
          Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, 1, 0));
          Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1}, 0, 1, 0));
          Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, -1}, 0, 1, 0));
          Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, 0}, 0, 1, 0));
          Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0, 0}, 0, 1, 0));
          Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1, 0}, 0, 1, 0));
      }
      

      This test passes with assertEquals when the code is fixed to apply masking to the final 3 bytes:

              case 3:
                  k1 ^= (data[index + 2] & 0xff) << 16;
              case 2:
                  k1 ^= (data[index + 1] & 0xff) << 8;
              case 1:
                  k1 ^= (data[index] & 0xff);
      

      Fixing this error will be a behavioural change.

      It is recommended to leave this method alone and implement a new hash32x86 method that should match the MurmurHash3_x86_32 method from the c++ source code.

       

      Attachments

        Issue Links

          Activity

            People

              aherbert Alex Herbert
              claude Claude Warren
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: