HBase
  1. HBase
  2. HBASE-5915

Improve performance of KeyValue comparison

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.1
    • Fix Version/s: None
    • Component/s: Performance
    • Labels:
      None

      Description

      Got bored tonight and poked around at making KeyValue comparison faster. I was able to get a ~23% speedup on a micro-benchmark I wrote by using Unsafe in a few more places. This should make some difference for real workloads, since this is usually one of the top CPU consumers.

      1. 5915-v2.txt
        12 kB
        Ted Yu
      2. hbase-5915.txt
        13 kB
        Todd Lipcon

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          8h 23m 1 Ted Yu 02/May/12 17:15
          Patch Available Patch Available Resolved Resolved
          1073d 8h 5m 1 Andrew Purtell 11/Apr/15 01:20
          Andrew Purtell made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Assignee Todd Lipcon [ tlipcon ]
          Resolution Fixed [ 1 ]
          Hide
          Andrew Purtell added a comment -

          Vladimir Rodionov later made this same optimization, which was committed

          Show
          Andrew Purtell added a comment - Vladimir Rodionov later made this same optimization, which was committed
          Hide
          Lars Hofhansl added a comment -

          I meant X86 is little endian (not Linux of course)

          Show
          Lars Hofhansl added a comment - I meant X86 is little endian (not Linux of course)
          Hide
          Lars Hofhansl added a comment -

          Patch looks good. Changing minWords * SIZEOF_LONG to minWords << 3, makes it less readable, and probably has no discernible difference (the compiler should be doing something like this anyway - if SIZEOF_LONG is final).

          "LexicographicalComparerHolder" is not the right name anymore.

          I'll try with some test workload next week. Linux is littleEndian, right? So the bytes would always need to be reversed. Maybe that explains that it is sometimes slower?

          Show
          Lars Hofhansl added a comment - Patch looks good. Changing minWords * SIZEOF_LONG to minWords << 3, makes it less readable, and probably has no discernible difference (the compiler should be doing something like this anyway - if SIZEOF_LONG is final). "LexicographicalComparerHolder" is not the right name anymore. I'll try with some test workload next week. Linux is littleEndian, right? So the bytes would always need to be reversed. Maybe that explains that it is sometimes slower?
          Harsh J made changes -
          Link This issue relates to HBASE-6200 [ HBASE-6200 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525313/5915-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.regionserver.TestHRegionOnCluster
          org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
          org.apache.hadoop.hbase.regionserver.TestSplitLogWorker

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525313/5915-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestHRegionOnCluster org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.regionserver.TestSplitLogWorker Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1728//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Before we commit this, we should run some actual workloads. There are two funky things I noticed about the patch:
          1) the test case always has the behavior that the first KV to be compared is in cache while the second one is not, which is not quite realistic in a real workload. Maybe it's better to time insertion of a few million KVs into a KeyValueSkipListSet instead.
          2) when I tried this patch with a YCSB workload, it actually took more CPU seconds than without the patch, which is odd given the microbenchmark. So something isn't aligning with reality. I didn't have time to investigate that.

          Show
          Todd Lipcon added a comment - Before we commit this, we should run some actual workloads. There are two funky things I noticed about the patch: 1) the test case always has the behavior that the first KV to be compared is in cache while the second one is not, which is not quite realistic in a real workload. Maybe it's better to time insertion of a few million KVs into a KeyValueSkipListSet instead. 2) when I tried this patch with a YCSB workload, it actually took more CPU seconds than without the patch, which is odd given the microbenchmark. So something isn't aligning with reality. I didn't have time to investigate that.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Ted
          Sorry Ted. I think its fine. Once i applied the patch it looked fine. Sorry for the confusion.

          Show
          ramkrishna.s.vasudevan added a comment - @Ted Sorry Ted. I think its fine. Once i applied the patch it looked fine. Sorry for the confusion.
          Hide
          Ted Yu added a comment -

          @Ram:
          The quoted snippet came from Todd.
          Can you elaborate what problem you saw ?

          Show
          Ted Yu added a comment - @Ram: The quoted snippet came from Todd. Can you elaborate what problem you saw ?
          Hide
          Ted Yu added a comment -

          TestKVComparisonPerformance without the patch:

          Run 99 took 99574us
          
          Avg: 98655us
          

          Here is the output with the patch:

          Run 99 took 79849us
          
          Avg: 79295us
          
          Show
          Ted Yu added a comment - TestKVComparisonPerformance without the patch: Run 99 took 99574us Avg: 98655us Here is the output with the patch: Run 99 took 79849us Avg: 79295us
          Hide
          ramkrishna.s.vasudevan added a comment -
          +              return lessThanUnsigned(Long.reverseBytes(lw),
          +                  Long.reverseBytes(rw)) ? -1 : 1;
          

          Patch v2 is different from V1. Was it intended Ted?

          Show
          ramkrishna.s.vasudevan added a comment - + return lessThanUnsigned( Long .reverseBytes(lw), + Long .reverseBytes(rw)) ? -1 : 1; Patch v2 is different from V1. Was it intended Ted?
          Ted Yu made changes -
          Attachment 5915-v2.txt [ 12525313 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525264/hbase-5915.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestCheckTestClasses

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525264/hbase-5915.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestCheckTestClasses Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1726//console This message is automatically generated.
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags Reviewed [ 10343 ]
          Hide
          stack added a comment -

          It looks great Todd.

          This is funny (when wrong-endian) --> + return Long.reverseBytes(result);

          Nice --> + final int minWords = minLength >>> 3; // / SIZEOF_LONG;.... that should be faster.

          Show
          stack added a comment - It looks great Todd. This is funny (when wrong-endian) --> + return Long.reverseBytes(result); Nice --> + final int minWords = minLength >>> 3; // / SIZEOF_LONG;.... that should be faster.
          Hide
          Ted Yu added a comment -

          The following variable doesn't seem to be used:

          +        int triggerBoundsCheck = buf[off + SIZEOF_LONG - 1];
          

          What's its purpose ?

          Show
          Ted Yu added a comment - The following variable doesn't seem to be used: + int triggerBoundsCheck = buf[off + SIZEOF_LONG - 1]; What's its purpose ?
          Todd Lipcon made changes -
          Field Original Value New Value
          Attachment hbase-5915.txt [ 12525264 ]
          Hide
          Todd Lipcon added a comment -

          Here's the patch and micro-benchmark.

          Would be curious if people can poke holes in it (or show with a more realistic workload if it makes a difference)

          Show
          Todd Lipcon added a comment - Here's the patch and micro-benchmark. Would be curious if people can poke holes in it (or show with a more realistic workload if it makes a difference)
          Todd Lipcon created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development