HBase
  1. HBase
  2. HBASE-1979

MurmurHash does not yield the same results as the reference C++ implementation when size % 4 >= 2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.20.3, 0.90.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Last rounds of MurmurHash are done in reverse order. data[length - 3], data[length - 2] and data[length - 1] in the block processing the remaining bytes should be data[len_m +2], data[len_m + 1], data[len_m].

      1. HBASE-1979.patch
        0.6 kB
        olivier gillet

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          1h 15m 1 Andrew Purtell 13/Nov/09 13:02
          Patch Available Patch Available Resolved Resolved
          11d 6h 26m 1 stack 24/Nov/09 19:29
          Resolved Resolved Closed Closed
          1052d 10h 44m 1 stack 12/Oct/12 07:14
          stack made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          stack added a comment -

          Commited to 0.20 branch.

          Show
          stack added a comment - Commited to 0.20 branch.
          Hide
          stack added a comment -

          OK. Sounds good then. Safe change.

          Show
          stack added a comment - OK. Sounds good then. Safe change.
          Hide
          Andrew Purtell added a comment -

          I checked references before making the commit on trunk. HRI.encodeRegionName explicitly uses JenkinsHash. I didn't see any generic use of Hash.hash(), or Hash.getInstance(). The only user of MurmurHash is PE at the moment.

          Show
          Andrew Purtell added a comment - I checked references before making the commit on trunk. HRI.encodeRegionName explicitly uses JenkinsHash. I didn't see any generic use of Hash.hash(), or Hash.getInstance(). The only user of MurmurHash is PE at the moment.
          Hide
          olivier gillet added a comment -

          The only use of the hash(...) function I found is in HRegionInfo.encodeRegionName (ryan, is that what you referred to?), but it seems to explicitly use JenkinsHash instead of Murmur.

          Show
          olivier gillet added a comment - The only use of the hash(...) function I found is in HRegionInfo.encodeRegionName (ryan, is that what you referred to?), but it seems to explicitly use JenkinsHash instead of Murmur.
          Hide
          ryan rawson added a comment -

          furthermore, we are going to have to keep both implementations so we can provide a migration path as we will need to restructure and rewrite the entire hbase data tree.

          Show
          ryan rawson added a comment - furthermore, we are going to have to keep both implementations so we can provide a migration path as we will need to restructure and rewrite the entire hbase data tree.
          Hide
          ryan rawson added a comment -

          there are no bloom filters in any installations right now AFAIK.

          Does this change the basic murmur hash function? Because it might break the encoded directory names of hbase!

          Show
          ryan rawson added a comment - there are no bloom filters in any installations right now AFAIK. Does this change the basic murmur hash function? Because it might break the encoded directory names of hbase!
          Hide
          Andrew Purtell added a comment -

          Thanks for the note Andrzej. HBase from 0.20.0 and up does not currently use bloom filters.

          Show
          Andrew Purtell added a comment - Thanks for the note Andrzej. HBase from 0.20.0 and up does not currently use bloom filters.
          Hide
          Andrzej Bialecki added a comment -

          Please note that this may cause a back-compat issue, as existing stored bloom filters will break (hash values are different after this patch, and this affects how bloom filters are populated / tested).

          Show
          Andrzej Bialecki added a comment - Please note that this may cause a back-compat issue, as existing stored bloom filters will break (hash values are different after this patch, and this affects how bloom filters are populated / tested).
          Andrew Purtell made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Andrew Purtell made changes -
          Hadoop Flags [Reviewed]
          Assignee olivier gillet [ pichenettes ]
          Fix Version/s 0.21.0 [ 12313607 ]
          Fix Version/s 0.20.3 [ 12314357 ]
          Hide
          Andrew Purtell added a comment -

          Committed to trunk. Tests running now. Will back out if any failure. Commit to branch pending open of 0.20.3. Thanks for the patch Olivier!

          Show
          Andrew Purtell added a comment - Committed to trunk. Tests running now. Will back out if any failure. Commit to branch pending open of 0.20.3. Thanks for the patch Olivier!
          olivier gillet made changes -
          Attachment HBASE-1979.patch [ 12424840 ]
          olivier gillet made changes -
          Field Original Value New Value
          Link This issue is cloned as HADOOP-6372 [ HADOOP-6372 ]
          olivier gillet created issue -

            People

            • Assignee:
              olivier gillet
              Reporter:
              olivier gillet
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development