HBase
  1. HBase
  2. HBASE-4012

Further optimize byte comparison methods

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Guava uses some clever tricks with sun.misc.Unsafe to compare byte arrays about 100% faster than the naive byte-by-byte implementation: http://guava-libraries.googlecode.com/svn/trunk/guava/src/com/google/common/primitives/UnsignedBytes.java

      We should borrow this [Apache 2 licensed] code.

      1. 4012-v2.txt
        11 kB
        Ted Yu
      2. 4012.txt
        4 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          stack added a comment -

          It is as though we should compare first 5 bytes with old method and if any over, then filp to Unsafe.

          Show
          stack added a comment - It is as though we should compare first 5 bytes with old method and if any over, then filp to Unsafe.
          Hide
          Lars Hofhansl added a comment -

          If Manukranth Kolloju's results are right I would expect the unsafe comparisons to be more efficient with many columns and/or versions, and potentially detrimental otherwise.

          Show
          Lars Hofhansl added a comment - If Manukranth Kolloju 's results are right I would expect the unsafe comparisons to be more efficient with many columns and/or versions, and potentially detrimental otherwise.
          Hide
          stack added a comment -

          I made HBASE-10775 to look at this again.

          Show
          stack added a comment - I made HBASE-10775 to look at this again.
          Hide
          stack added a comment -

          Sounds like this is worth a bit of retesting given Manukranth Kolloju and Lars Hofhansl findings.

          Show
          stack added a comment - Sounds like this is worth a bit of retesting given Manukranth Kolloju and Lars Hofhansl findings.
          Hide
          Manukranth Kolloju added a comment -

          [... I was hoping* to see parity ...]

          Show
          Manukranth Kolloju added a comment - [... I was hoping* to see parity ...]
          Hide
          Manukranth Kolloju added a comment -

          Yes, In the above paste, prefix ~ Common prefix. The actual rows can be arbitrarily long. It is reasonable to have very good performance in case of long prefixes. But I was curious if someone else also did study the short prefixes case. Especially where the common prefix was comparable to length of long. I was groping to see parity in those cases but instead there was a regression. As far as my simple benchmarks show.

          Show
          Manukranth Kolloju added a comment - Yes, In the above paste, prefix ~ Common prefix. The actual rows can be arbitrarily long. It is reasonable to have very good performance in case of long prefixes. But I was curious if someone else also did study the short prefixes case. Especially where the common prefix was comparable to length of long. I was groping to see parity in those cases but instead there was a regression. As far as my simple benchmarks show.
          Hide
          Lars Hofhansl added a comment -

          I did some - entirely unscientific - tests on this a while ago, and in my scenarios there was no measurable difference between the two implementations (naive byte[] compare and the unsafe code).

          Show
          Lars Hofhansl added a comment - I did some - entirely unscientific - tests on this a while ago, and in my scenarios there was no measurable difference between the two implementations (naive byte[] compare and the unsafe code).
          Hide
          stack added a comment -

          Manukranth Kolloju How do I read the above? If the keys have the first 50 bytes in common, then guava compare is 30% faster? If <10 characters in common, it is slower? Thanks.

          Show
          stack added a comment - Manukranth Kolloju How do I read the above? If the keys have the first 50 bytes in common, then guava compare is 30% faster? If <10 characters in common, it is slower? Thanks.
          Hide
          Manukranth Kolloju added a comment -

          I know that this Jira was long closed. I stumbled upon this and realized that we don't have it in 0.89-fb. Is it possible to have a regression in some cases. I tried to benchmark some scenarios where the comparisions are done on byte arrays with common prefixes. I found that as we the

          Prefix : 50, gain : 30.949085592296317
          Prefix : 45, gain : 29.72626255629741
          Prefix : 40, gain : 26.434751984314396
          Prefix : 35, gain : 24.259677810557402
          Prefix : 30, gain : 22.717778496708313
          Prefix : 25, gain : 14.16316408302645
          Prefix : 20, gain : 8.768630290777425
          Prefix : 15, gain : 1.2726417069570561
          Prefix : 10, gain : -17.46894837698785
          Prefix : 5, gain : -34.9383386755005
          Prefix : 0, gain : -117.8493223109523

          In most of the cases we will have considerable amount of data in each row key, So, the KVComparator.compareRows will mostly return 0, so this should help the cause. But did anyone see a regression by switching to this?

          Show
          Manukranth Kolloju added a comment - I know that this Jira was long closed. I stumbled upon this and realized that we don't have it in 0.89-fb. Is it possible to have a regression in some cases. I tried to benchmark some scenarios where the comparisions are done on byte arrays with common prefixes. I found that as we the Prefix : 50, gain : 30.949085592296317 Prefix : 45, gain : 29.72626255629741 Prefix : 40, gain : 26.434751984314396 Prefix : 35, gain : 24.259677810557402 Prefix : 30, gain : 22.717778496708313 Prefix : 25, gain : 14.16316408302645 Prefix : 20, gain : 8.768630290777425 Prefix : 15, gain : 1.2726417069570561 Prefix : 10, gain : -17.46894837698785 Prefix : 5, gain : -34.9383386755005 Prefix : 0, gain : -117.8493223109523 In most of the cases we will have considerable amount of data in each row key, So, the KVComparator.compareRows will mostly return 0, so this should help the cause. But did anyone see a regression by switching to this?
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2131 (See https://builds.apache.org/job/HBase-TRUNK/2131/)
          HBASE-4239 HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES

          tedyu :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2131 (See https://builds.apache.org/job/HBase-TRUNK/2131/ ) HBASE-4239 HBASE-4012 introduced duplicate variable Bytes.LONG_BYTES tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          Hide
          stack added a comment -

          @Ted The comments in that issue are kinda funny. They are fine. I was just pointing them out.

          @Eric Thanks for checking Unsafe is still in 1.7 (ugh, oracle.* is inevitable I'd say).

          Show
          stack added a comment - @Ted The comments in that issue are kinda funny. They are fine. I was just pointing them out. @Eric Thanks for checking Unsafe is still in 1.7 (ugh, oracle.* is inevitable I'd say).
          Hide
          Eric Charles added a comment -

          sun.msc.Unsafe is still in jdk7 (http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/misc/Unsafe.java)
          I tried to build hbase trunk with JDK build 1.7.0-ea-b143, and it's OK.

          There's always a risk it will be renamed one day "oracle.msc.Unsafe"...
          I also remember my surprise going from JDK1.4 to JDK1.5 and having a sun class related to encoding simply being removed, although I had been warned.

          Are there any metric tests that give indication on performance gain?

          Show
          Eric Charles added a comment - sun.msc.Unsafe is still in jdk7 ( http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/misc/Unsafe.java ) I tried to build hbase trunk with JDK build 1.7.0-ea-b143, and it's OK. There's always a risk it will be renamed one day "oracle.msc.Unsafe"... I also remember my surprise going from JDK1.4 to JDK1.5 and having a sun class related to encoding simply being removed, although I had been warned. Are there any metric tests that give indication on performance gain?
          Show
          Ted Yu added a comment - I don't think there is a way to suppress the warning: http://webcache.googleusercontent.com/search?q=cache:UBSRBVvlv6AJ:bugs.sun.com/bugdatabase/view_bug.do%3Fbug_id%3D6476630+suppress+is+Sun+proprietary+API+and&cd=1&hl=en&ct=clnk&gl=us&client=firefox-a&source=www.google.com
          Hide
          stack added a comment -

          Interesting. It looks like we've now picked up the below on build:

          
          [INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ hbase ---
          [INFO] Compiling 527 source files to /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/target/classes
          [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[44,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
          
          [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1037,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
          
          [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1043,21] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
          
          [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1048,28] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
          
          
          Show
          stack added a comment - Interesting. It looks like we've now picked up the below on build: [INFO] --- maven-compiler-plugin:2.0.2:compile ( default -compile) @ hbase --- [INFO] Compiling 527 source files to /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/target/classes [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[44,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1037,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1043,21] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/hudson/hudson-slave/workspace/HBase-TRUNK/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1048,28] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1996 (See https://builds.apache.org/job/HBase-TRUNK/1996/)
          HBASE-4012 Further optimize byte comparison methods (Ted Yu)

          tedyu :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1996 (See https://builds.apache.org/job/HBase-TRUNK/1996/ ) HBASE-4012 Further optimize byte comparison methods (Ted Yu) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          Hide
          Ted Yu added a comment -

          Integrated to TRUNK.

          Thanks for the review Todd and Stack.

          Show
          Ted Yu added a comment - Integrated to TRUNK. Thanks for the review Todd and Stack.
          Hide
          stack added a comment -

          This looks fine to me. +1

          Show
          stack added a comment - This looks fine to me. +1
          Hide
          Ted Yu added a comment -

          Version 2 created a Comparer interface to accommodate the original compareTo() method.

          I verified through debugging TestBytes that UnsafeComparer was used.

          Show
          Ted Yu added a comment - Version 2 created a Comparer interface to accommodate the original compareTo() method. I verified through debugging TestBytes that UnsafeComparer was used.
          Hide
          Todd Lipcon added a comment -

          The current implementation won't work on non-sun JVMs. Note how the guava code uses a static inner class, and then the getBestComparator code catches any exceptions thrown trying to classload it. We need to do something similar.

          Show
          Todd Lipcon added a comment - The current implementation won't work on non-sun JVMs. Note how the guava code uses a static inner class, and then the getBestComparator code catches any exceptions thrown trying to classload it. We need to do something similar.
          Hide
          Ted Yu added a comment -

          Test suite got stuck in TestSplitTransactionOnCluster
          There was no other test failure thus far.

          Show
          Ted Yu added a comment - Test suite got stuck in TestSplitTransactionOnCluster There was no other test failure thus far.
          Hide
          Ted Yu added a comment -

          TestBytes passed.
          Running test suite.

          Show
          Ted Yu added a comment - TestBytes passed. Running test suite.
          Hide
          Todd Lipcon added a comment -

          was figuring I might work on this on Thursday at the contributor meetup. But, if you want to take a crack, feel free.

          Show
          Todd Lipcon added a comment - was figuring I might work on this on Thursday at the contributor meetup. But, if you want to take a crack, feel free.
          Hide
          Ted Yu added a comment -

          @Todd:
          If you don't have time, I can try coming up with a patch.

          Show
          Ted Yu added a comment - @Todd: If you don't have time, I can try coming up with a patch.
          Hide
          stack added a comment -

          I like this kinda comment:

                   * Compare 8 bytes at a time. Benchmarking shows comparing 8 bytes at a
                   * time is no slower than comparing 4 bytes at a time even on 32-bit.
                   * On the other hand, it is substantially faster on 64-bit.
          
          Show
          stack added a comment - I like this kinda comment: * Compare 8 bytes at a time. Benchmarking shows comparing 8 bytes at a * time is no slower than comparing 4 bytes at a time even on 32-bit. * On the other hand, it is substantially faster on 64-bit.
          Hide
          Andrew Purtell added a comment -

          +1

          Byte comparison is the top hotspot for all workloads I've profiled. Can get big wins here.

          Show
          Andrew Purtell added a comment - +1 Byte comparison is the top hotspot for all workloads I've profiled. Can get big wins here.

            People

            • Assignee:
              Ted Yu
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development