Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      Yonik has ported an implementation of MurmurHash 3.0 and put it in the public domain: http://www.lucidimagination.com/blog/2011/09/15/murmurhash3-for-java/

      It's a port of https://sites.google.com/site/murmurhash/ which says:

      (I reserve the right to tweak the constants after people have had a chance to bang on it). Murmur3 has better performance than MurmurHash2, no repetition flaw, comes in 32/64/128-bit versions for both x86 and x64 platforms, and the 128-bit x64 version is blazing fast - over 5 gigabytes per second on my 3 gigahertz Core 2.

      In addition, the library of test code that I use to test MurmurHash (called SMHasher) has been released - it's still rough (and will only compile under VC++ at the moment), but it contains everything needed to verify hash functions of arbitrary output bit-lengths.

      Murmur3 and all future versions will be hosted on Google Code here - http://code.google.com/p/smhasher/ - you can access the codebase via the 'Source' tab at the top.

      See also http://code.google.com/p/smhasher/

      We should add support for it and hook into MinHash

      1. MAHOUT-862.patch
        5 kB
        Grant Ingersoll

        Activity

        Hide
        Grant Ingersoll added a comment -

        Here's a patch that adds MurmurHash3. Tests pass, but I'm not an expert in this stuff, so if someone else wants to take a peak, that would be great.

        Show
        Grant Ingersoll added a comment - Here's a patch that adds MurmurHash3. Tests pass, but I'm not an expert in this stuff, so if someone else wants to take a peak, that would be great.
        Hide
        Grant Ingersoll added a comment -

        I accidentally committed this when making a minor change to build-reuters.sh. Rather than rollback, I'm going to let it stick and others can simply patch what I put in, as it is new functionality that doesn't change the old functionality.

        Show
        Grant Ingersoll added a comment - I accidentally committed this when making a minor change to build-reuters.sh. Rather than rollback, I'm going to let it stick and others can simply patch what I put in, as it is new functionality that doesn't change the old functionality.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 1196616.

        I'll leave open for a day or two so others can review.

        Show
        Grant Ingersoll added a comment - Committed revision 1196616. I'll leave open for a day or two so others can review.
        Hide
        Ted Dunning added a comment -

        Just looked at the MH3 code. It definitely needs the ByteBuffer treatment. It also needs to have test cases. Yonik's word is usually good, but I don't like having code without good tests.

        Show
        Ted Dunning added a comment - Just looked at the MH3 code. It definitely needs the ByteBuffer treatment. It also needs to have test cases. Yonik's word is usually good, but I don't like having code without good tests.
        Hide
        Dawid Weiss added a comment -

        Are speed gains on bytebuffers a result of unsafe underlying buffer accesses? A simple array loop with predictable ends should be optimized pretty much the same way though (boundary checks only, followed by no-checks accesses); wonder where the gain comes from then?

        Also, this could be rewritten to actually make use of Unsafe if it's available – I bet with larger memory chunks the speed gain would be noticeable.

        Show
        Dawid Weiss added a comment - Are speed gains on bytebuffers a result of unsafe underlying buffer accesses? A simple array loop with predictable ends should be optimized pretty much the same way though (boundary checks only, followed by no-checks accesses); wonder where the gain comes from then? Also, this could be rewritten to actually make use of Unsafe if it's available – I bet with larger memory chunks the speed gain would be noticeable.
        Hide
        Yonik Seeley added a comment -

        The test case I used is here:
        https://github.com/yonik/java_util/tree/master/test/util/hash

        Also, this could be rewritten to actually make use of Unsafe if it's available – I bet with larger memory chunks the speed gain would be noticeable.

        For larger memory chunks I'd go with the 64 bit variant of MurmurHash.

        Show
        Yonik Seeley added a comment - The test case I used is here: https://github.com/yonik/java_util/tree/master/test/util/hash Also, this could be rewritten to actually make use of Unsafe if it's available – I bet with larger memory chunks the speed gain would be noticeable. For larger memory chunks I'd go with the 64 bit variant of MurmurHash.
        Hide
        Ted Dunning added a comment -

        Are speed gains on bytebuffers a result of unsafe underlying buffer accesses?

        No. The speed gains are largely because getLong works really well on ByteBuffers (certainly better than byte by byte shift and mask code). That then allows the JVM to do better loop optimizations (I think).

        Show
        Ted Dunning added a comment - Are speed gains on bytebuffers a result of unsafe underlying buffer accesses? No. The speed gains are largely because getLong works really well on ByteBuffers (certainly better than byte by byte shift and mask code). That then allows the JVM to do better loop optimizations (I think).
        Hide
        Ted Dunning added a comment -

        The test case I used is here:
        https://github.com/yonik/java_util/tree/master/test/util/hash

        Excellent. We should use that test. My grump was that Mahout didn't have the test, not that I knew the test didn't exist (I knew little, really).

        Show
        Ted Dunning added a comment - The test case I used is here: https://github.com/yonik/java_util/tree/master/test/util/hash Excellent. We should use that test. My grump was that Mahout didn't have the test, not that I knew the test didn't exist (I knew little, really).
        Hide
        Dawid Weiss added a comment -

        Yes, sure, 64-bit is fine, but still: I was wondering what kind of machine code is generated for accessing array indexes – if there is bounds checking this will contribute to the overall time (compared to the C version, for example, or the unsafe mem. access).

        I'm not advocating to make it more complex by using Unsafe, I was just curious about the difference and sort of thinking aloud.

        Show
        Dawid Weiss added a comment - Yes, sure, 64-bit is fine, but still: I was wondering what kind of machine code is generated for accessing array indexes – if there is bounds checking this will contribute to the overall time (compared to the C version, for example, or the unsafe mem. access). I'm not advocating to make it more complex by using Unsafe, I was just curious about the difference and sort of thinking aloud.
        Hide
        Dawid Weiss added a comment -

        getLong works really well on ByteBuffers

        That's probably because it's an intrinsic reading in 8 bytes at a time without bounds checking, but I'd have to confirm that by looking at the jitted code dump (or in openjdk code). Put at the end of my todo list for tonight

        Show
        Dawid Weiss added a comment - getLong works really well on ByteBuffers That's probably because it's an intrinsic reading in 8 bytes at a time without bounds checking, but I'd have to confirm that by looking at the jitted code dump (or in openjdk code). Put at the end of my todo list for tonight
        Hide
        Grant Ingersoll added a comment -

        I committed the test.

        Show
        Grant Ingersoll added a comment - I committed the test.
        Hide
        Ted Dunning added a comment -

        That's probably because it's an intrinsic reading in 8 bytes at a time without bounds checking, but I'd have to confirm that by looking at the jitted code dump (or in openjdk code).

        I imagine that it is an intrinsic and that it has less bounds checking, but I am sure that it is safe from over-run (i.e. has sufficient bounds-checking).

        Show
        Ted Dunning added a comment - That's probably because it's an intrinsic reading in 8 bytes at a time without bounds checking, but I'd have to confirm that by looking at the jitted code dump (or in openjdk code). I imagine that it is an intrinsic and that it has less bounds checking, but I am sure that it is safe from over-run (i.e. has sufficient bounds-checking).
        Hide
        Dawid Weiss added a comment -

        Checked the source code of openjdk out of curiosity. Mapped buffers (for example longbuffer over bytebuffer) are implemented with a mix of various intrisics. For example in:

            public LongBuffer put(int i, long x) {
                Bits.putLongL(bb, ix(checkIndex(i)), x);
                return this;
            }
        

        checkIndex is an intrinsic and putLongL uses internal unchecked _put methods, so the sequence:

            static void putLongL(ByteBuffer bb, int bi, long x) {
                bb._put(bi + 7, long7(x));
                bb._put(bi + 6, long6(x));
                bb._put(bi + 5, long5(x));
                bb._put(bi + 4, long4(x));
                bb._put(bi + 3, long3(x));
                bb._put(bi + 2, long2(x));
                bb._put(bi + 1, long1(x));
                bb._put(bi    , long0(x));
            } 
        

        is pretty much index-checked once. Bits contains lots of other accesses to Unsafe and I bet this is most of the speedup over normal Java code.

        I didn't have the time to inspect assembly dumps to verify for sure, but the above should pretty much address the questions raised earlier in this thread. [Applies to SUN/Oracle HotSpot only, didn't check other VMs.]

        Show
        Dawid Weiss added a comment - Checked the source code of openjdk out of curiosity. Mapped buffers (for example longbuffer over bytebuffer) are implemented with a mix of various intrisics. For example in: public LongBuffer put( int i, long x) { Bits.putLongL(bb, ix(checkIndex(i)), x); return this ; } checkIndex is an intrinsic and putLongL uses internal unchecked _put methods, so the sequence: static void putLongL(ByteBuffer bb, int bi, long x) { bb._put(bi + 7, long7(x)); bb._put(bi + 6, long6(x)); bb._put(bi + 5, long5(x)); bb._put(bi + 4, long4(x)); bb._put(bi + 3, long3(x)); bb._put(bi + 2, long2(x)); bb._put(bi + 1, long1(x)); bb._put(bi , long0(x)); } is pretty much index-checked once. Bits contains lots of other accesses to Unsafe and I bet this is most of the speedup over normal Java code. I didn't have the time to inspect assembly dumps to verify for sure, but the above should pretty much address the questions raised earlier in this thread. [Applies to SUN/Oracle HotSpot only, didn't check other VMs.]

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development