Hadoop Common
  1. Hadoop Common
  2. HADOOP-6372

MurmurHash does not yield the same results as the reference C++ implementation when size % 4 >= 2

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: util
    • Labels:
      None

      Description

      Last rounds of MurmurHash are done in reverse order. data[length - 3], data[length - 2] and data[length - 1] in the block processing the remaining bytes should be data[len_m +2], data[len_m + 1], data[len_m].

      1. murmur.patch
        4 kB
        Andrzej Bialecki
      2. HADOOP-6372.patch
        3 kB
        olivier gillet

        Issue Links

          Activity

          olivier gillet created issue -
          olivier gillet made changes -
          Field Original Value New Value
          Link This issue is a clone of HBASE-1979 [ HBASE-1979 ]
          Hide
          olivier gillet added a comment -

          Code is duplicated in both projects

          Show
          olivier gillet added a comment - Code is duplicated in both projects
          olivier gillet made changes -
          Attachment HADOOP-6372.patch [ 12424839 ]
          Hide
          Andrzej Bialecki added a comment -

          I confirm the bug, and the bugfix works as expected.

          Show
          Andrzej Bialecki added a comment - I confirm the bug, and the bugfix works as expected.
          Hide
          Andrzej Bialecki added a comment -

          This patch includes the fix from Olivier, and updates the hash to Murmur 2.0A that provides better handling of empty keys.

          Show
          Andrzej Bialecki added a comment - This patch includes the fix from Olivier, and updates the hash to Murmur 2.0A that provides better handling of empty keys.
          Andrzej Bialecki made changes -
          Attachment murmur.patch [ 12424856 ]
          Hide
          olivier gillet added a comment -

          It would be nice if the update to 2.0A was also done on HBase's side. Is it common to run a version of HBase not in sync with the matching version of Hadoop-common? If not, could the few classes using org.apache.hadoop.hbase.util.Hash use org.apache.hadoop.util.hash.Hash instead?

          Show
          olivier gillet added a comment - It would be nice if the update to 2.0A was also done on HBase's side. Is it common to run a version of HBase not in sync with the matching version of Hadoop-common? If not, could the few classes using org.apache.hadoop.hbase.util.Hash use org.apache.hadoop.util.hash.Hash instead?
          Hide
          dhruba borthakur added a comment -

          Does this mean that HDFS data blocks that were checksummed using this algorithm is now unread-able with this bug fix? How likely is this possibility?

          Show
          dhruba borthakur added a comment - Does this mean that HDFS data blocks that were checksummed using this algorithm is now unread-able with this bug fix? How likely is this possibility?
          Hide
          olivier gillet added a comment -

          I was not aware hashes from util.hash were used for HDFS checksumming. Could you point me to the code using them?

          Show
          olivier gillet added a comment - I was not aware hashes from util.hash were used for HDFS checksumming. Could you point me to the code using them?
          Hide
          dhruba borthakur added a comment -

          I was referring to recent changes via HADOOP-6148 and HDFS-496

          Show
          dhruba borthakur added a comment - I was referring to recent changes via HADOOP-6148 and HDFS-496
          Hide
          dhruba borthakur added a comment -

          But you are right, that was mostly a Crc32 computation change and has nothing to do with murmur hash.

          Show
          dhruba borthakur added a comment - But you are right, that was mostly a Crc32 computation change and has nothing to do with murmur hash.
          Hide
          David Rosenstrauch added a comment -

          I just ran into this problem as well. Are there any plans to release a fix?

          Show
          David Rosenstrauch added a comment - I just ran into this problem as well. Are there any plans to release a fix?

            People

            • Assignee:
              Unassigned
              Reporter:
              olivier gillet
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development