HBase
  1. HBase
  2. HBASE-1979

MurmurHash does not yield the same results as the reference C++ implementation when size % 4 >= 2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.20.3, 0.90.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Last rounds of MurmurHash are done in reverse order. data[length - 3], data[length - 2] and data[length - 1] in the block processing the remaining bytes should be data[len_m +2], data[len_m + 1], data[len_m].

      1. HBASE-1979.patch
        0.6 kB
        olivier gillet

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          Committed to trunk. Tests running now. Will back out if any failure. Commit to branch pending open of 0.20.3. Thanks for the patch Olivier!

          Show
          Andrew Purtell added a comment - Committed to trunk. Tests running now. Will back out if any failure. Commit to branch pending open of 0.20.3. Thanks for the patch Olivier!
          Hide
          Andrzej Bialecki added a comment -

          Please note that this may cause a back-compat issue, as existing stored bloom filters will break (hash values are different after this patch, and this affects how bloom filters are populated / tested).

          Show
          Andrzej Bialecki added a comment - Please note that this may cause a back-compat issue, as existing stored bloom filters will break (hash values are different after this patch, and this affects how bloom filters are populated / tested).
          Hide
          Andrew Purtell added a comment -

          Thanks for the note Andrzej. HBase from 0.20.0 and up does not currently use bloom filters.

          Show
          Andrew Purtell added a comment - Thanks for the note Andrzej. HBase from 0.20.0 and up does not currently use bloom filters.
          Hide
          ryan rawson added a comment -

          there are no bloom filters in any installations right now AFAIK.

          Does this change the basic murmur hash function? Because it might break the encoded directory names of hbase!

          Show
          ryan rawson added a comment - there are no bloom filters in any installations right now AFAIK. Does this change the basic murmur hash function? Because it might break the encoded directory names of hbase!
          Hide
          ryan rawson added a comment -

          furthermore, we are going to have to keep both implementations so we can provide a migration path as we will need to restructure and rewrite the entire hbase data tree.

          Show
          ryan rawson added a comment - furthermore, we are going to have to keep both implementations so we can provide a migration path as we will need to restructure and rewrite the entire hbase data tree.
          Hide
          olivier gillet added a comment -

          The only use of the hash(...) function I found is in HRegionInfo.encodeRegionName (ryan, is that what you referred to?), but it seems to explicitly use JenkinsHash instead of Murmur.

          Show
          olivier gillet added a comment - The only use of the hash(...) function I found is in HRegionInfo.encodeRegionName (ryan, is that what you referred to?), but it seems to explicitly use JenkinsHash instead of Murmur.
          Hide
          Andrew Purtell added a comment -

          I checked references before making the commit on trunk. HRI.encodeRegionName explicitly uses JenkinsHash. I didn't see any generic use of Hash.hash(), or Hash.getInstance(). The only user of MurmurHash is PE at the moment.

          Show
          Andrew Purtell added a comment - I checked references before making the commit on trunk. HRI.encodeRegionName explicitly uses JenkinsHash. I didn't see any generic use of Hash.hash(), or Hash.getInstance(). The only user of MurmurHash is PE at the moment.
          Hide
          stack added a comment -

          OK. Sounds good then. Safe change.

          Show
          stack added a comment - OK. Sounds good then. Safe change.
          Hide
          stack added a comment -

          Commited to 0.20 branch.

          Show
          stack added a comment - Commited to 0.20 branch.

            People

            • Assignee:
              olivier gillet
              Reporter:
              olivier gillet
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development