HBase
  1. HBase
  2. HBASE-3855

Performance degradation of memstore because reseek is linear

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.4
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The scanner use reseek to find the next row (or next column) as part of a scan. The reseek code iterates over a Set to position itself at the right place. If there are many thousands of kvs that need to be skipped over, then the time-cost is very high. In this case, a seek would be far lesser in cost than a reseek.

      1. memstoreReseek.txt
        3 kB
        dhruba borthakur
      2. memstoreReseek2.txt
        3 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          Jonathan Hsieh added a comment -

          Ouch!

          Yes, I think we'd like to get HBASE-4195 in o 0.90.7 as well – not for performance but for correctness reasons.

          Show
          Jonathan Hsieh added a comment - Ouch! Yes, I think we'd like to get HBASE-4195 in o 0.90.7 as well – not for performance but for correctness reasons.
          Hide
          stack added a comment -

          Thanks Gregory. Kick your Jon. He might be interested in this state of affairs given he's going for 0.90.7 (most of the rest of us have moved on...)

          Show
          stack added a comment - Thanks Gregory. Kick your Jon. He might be interested in this state of affairs given he's going for 0.90.7 (most of the rest of us have moved on...)
          Hide
          Gregory Chanan added a comment -

          Addendum: I closed because I thought it was confusing to have an issue be open when it was committed in 0.90.4 and never backed out [and no one is talking about backing it out]. Perhaps we should create a new bug that tracks backporting HBASE-4195 to 0.90 and, as Andrew suggests, put [0.90 branch] there?

          Show
          Gregory Chanan added a comment - Addendum: I closed because I thought it was confusing to have an issue be open when it was committed in 0.90.4 and never backed out [and no one is talking about backing it out] . Perhaps we should create a new bug that tracks backporting HBASE-4195 to 0.90 and, as Andrew suggests, put [0.90 branch] there?
          Hide
          Gregory Chanan added a comment -

          It looks like this was committed way back in 0.90.4. Closing.

          commit 4f4edbaa043952715d4eb9a40605154c6e41d179
          Author: Michael Stack <stack@apache.org>
          Date: Fri Jun 10 19:21:41 2011 +0000

          HBASE-3855 Performance degradation of memstore because reseek is linear

          git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/0.90@1134419 13f79535-47bb-0310-9956-ffa450edef68

          Perhaps this was left open for HBASE-4195 (which never made it into 0.90).

          Show
          Gregory Chanan added a comment - It looks like this was committed way back in 0.90.4. Closing. commit 4f4edbaa043952715d4eb9a40605154c6e41d179 Author: Michael Stack <stack@apache.org> Date: Fri Jun 10 19:21:41 2011 +0000 HBASE-3855 Performance degradation of memstore because reseek is linear git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/0.90@1134419 13f79535-47bb-0310-9956-ffa450edef68 Perhaps this was left open for HBASE-4195 (which never made it into 0.90).
          Hide
          Andrew Purtell added a comment -

          If this is an 0.90 branch only issue then I suggest we can just leave this open without a target fix version and change the title to be "[0.90 branch] ...". Someone can pick it up and push it through to completion if they find it compelling.

          Show
          Andrew Purtell added a comment - If this is an 0.90 branch only issue then I suggest we can just leave this open without a target fix version and change the title to be " [0.90 branch] ...". Someone can pick it up and push it through to completion if they find it compelling.
          Hide
          dhruba borthakur added a comment -

          Sorry Jonathan: I might not be working on this one anymore.

          Show
          dhruba borthakur added a comment - Sorry Jonathan: I might not be working on this one anymore.
          Hide
          Jonathan Hsieh added a comment -

          No response for half a week, going to punt so I can cut an RC.

          Show
          Jonathan Hsieh added a comment - No response for half a week, going to punt so I can cut an RC.
          Hide
          Jonathan Hsieh added a comment -

          Hey dhruba borthakur, do you plan on taking on this issue on or shall we punt on this for 0.90.7 release?

          Show
          Jonathan Hsieh added a comment - Hey dhruba borthakur , do you plan on taking on this issue on or shall we punt on this for 0.90.7 release?
          Hide
          ramkrishna.s.vasudevan added a comment -

          Not fixed in this 0.90.6. Hence moving it to 0.90.7.

          Show
          ramkrishna.s.vasudevan added a comment - Not fixed in this 0.90.6. Hence moving it to 0.90.7.
          Hide
          stack added a comment -

          OK. Moving to 0.90.5. I did not apply 4195 to the branch BECAUSE it does not apply over on the branch (which means I must have been dreaming yesterday when I thought I was testing 4195 on 0.90 – I must have been running it on TRUNK). Leaving this as open against 0.90.5 rather than against 0.92 since we don't seem to have the issue that caused the reopen in TRUNK (and 4195 improves on the original patch here anyways).

          Show
          stack added a comment - OK. Moving to 0.90.5. I did not apply 4195 to the branch BECAUSE it does not apply over on the branch (which means I must have been dreaming yesterday when I thought I was testing 4195 on 0.90 – I must have been running it on TRUNK). Leaving this as open against 0.90.5 rather than against 0.92 since we don't seem to have the issue that caused the reopen in TRUNK (and 4195 improves on the original patch here anyways).
          Hide
          Andrew Purtell added a comment -

          +1, sounds good Stack

          Show
          Andrew Purtell added a comment - +1, sounds good Stack
          Hide
          stack added a comment -

          @Andrew I ran tests with this hbase-4195 in place on 0.90. I ran 20 iterations and I did not see failure (Previous I'd see 1/5th fail). What you think of applying hbase-4195 to 0.90 and then closing this issue?

          Show
          stack added a comment - @Andrew I ran tests with this hbase-4195 in place on 0.90. I ran 20 iterations and I did not see failure (Previous I'd see 1/5th fail). What you think of applying hbase-4195 to 0.90 and then closing this issue?
          Hide
          Nicolas Liochon added a comment -

          Done, see HBASE-4195

          Show
          Nicolas Liochon added a comment - Done, see HBASE-4195
          Hide
          Ted Yu added a comment -

          @NKeywal:
          Thanks for the analysis.

          Please put the above analysis in a new JIRA.

          Show
          Ted Yu added a comment - @NKeywal: Thanks for the analysis. Please put the above analysis in a new JIRA.
          Hide
          Nicolas Liochon added a comment -

          I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690):

          679	    protected KeyValue getNext(Iterator it) {
          680	      KeyValue ret = null;
          681	      long readPoint = ReadWriteConsistencyControl.getThreadReadPoint();
          682	      //DebugPrint.println( " MS@" + hashCode() + ": threadpoint = " + readPoint);
          683	 
          684	      while (ret == null && it.hasNext()) {
          685	        KeyValue v = it.next();
          686	        if (v.getMemstoreTS() <= readPoint) {
          687	          // keep it.
          688	          ret = v;
          689	        }
          690	        numIterReseek--;
          691	        if (numIterReseek == 0) {
          692	          break;
          693	         }
          694	      }
          695	      return ret;
          696	    }

          This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek.

          There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug.

          1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart.

          2) In "reseek", numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative.

          3) Still in "reseek", the test to call "seek" is (kvsetNextRow == null && numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called.

          4) You can have side effects if reseek ends with a numIterReseek > 0: the following calls to the "next" function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek.

          To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek:

                numIterReseek = 0;
                return (kvsetNextRow != null || snapshotNextRow != null);
              }

          On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here.

          Show
          Nicolas Liochon added a comment - I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): 679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( " MS@" + hashCode() + ": threadpoint = " + readPoint); 683 684 while (ret == null && it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() <= readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693 } 694 } 695 return ret; 696 } This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In "reseek", numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in "reseek", the test to call "seek" is (kvsetNextRow == null && numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek > 0: the following calls to the "next" function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); } On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here.
          Hide
          stack added a comment -

          Odd. MemStore is same in branch as it is in trunk (Patch is same for both). Let me look at why test is failing.

          Show
          stack added a comment - Odd. MemStore is same in branch as it is in trunk (Patch is same for both). Let me look at why test is failing.
          Hide
          stack added a comment -

          I ran 50 tests on 0.90 and 20% failed. Odd that it fails on 0.90 and not on TRUNK. Let me look at the commit you single out above Andrew.

          Show
          stack added a comment - I ran 50 tests on 0.90 and 20% failed. Odd that it fails on 0.90 and not on TRUNK. Let me look at the commit you single out above Andrew.
          Hide
          stack added a comment -

          I ran 100 TestHRegion tests back to back and none failed for me (TRUNK). Let me try on 0.90 branch.

          Show
          stack added a comment - I ran 100 TestHRegion tests back to back and none failed for me (TRUNK). Let me try on 0.90 branch.
          Hide
          stack added a comment -

          @Andrew You think we are getting back bad answers from MemStore? Looking at patch, I fail to see how it'd return an incorrect answer but maybe I'm missing something?

          Show
          stack added a comment - @Andrew You think we are getting back bad answers from MemStore? Looking at patch, I fail to see how it'd return an incorrect answer but maybe I'm missing something?
          Hide
          Andrew Purtell added a comment -

          Do we call it for the test or this patch?

          Show
          Andrew Purtell added a comment - Do we call it for the test or this patch?
          Show
          Ted Yu added a comment - This happened in TRUNK as well: https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.regionserver/TestHRegion/testWritesWhileGetting/
          Hide
          Andrew Purtell added a comment -

          I have tracked down occasionally failing TestHRegion tests on 0.90 branch to this commit:

          commit 4f4edbaa043952715d4eb9a40605154c6e41d179
          Author: Michael Stack <stack@apache.org>
          Date:   Fri Jun 10 19:21:41 2011 +0000
          
              HBASE-3855 Performance degradation of memstore because reseek is linear
              
              git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/0.90@1134419 13f
          

          Perhaps 1 out of 25 test runs fail on my dev laptop.

          The same problem may exist on trunk, but I have not checked yet for evidence of that.

          Failures are like:

          Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.186 sec <<< FAILURE!
          testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.187 sec <<< FAILURE!
          junit.framework.AssertionFailedError: expected:<\x00\x00\x006> but was:<\x00\x00\x004>
          at org.apache.hadoop.hbase.HBaseTestCase.assertEquals(HBaseTestCase.java:685)
          at org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:2711)

          Show
          Andrew Purtell added a comment - I have tracked down occasionally failing TestHRegion tests on 0.90 branch to this commit: commit 4f4edbaa043952715d4eb9a40605154c6e41d179 Author: Michael Stack <stack@apache.org> Date: Fri Jun 10 19:21:41 2011 +0000 HBASE-3855 Performance degradation of memstore because reseek is linear git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/0.90@1134419 13f Perhaps 1 out of 25 test runs fail on my dev laptop. The same problem may exist on trunk, but I have not checked yet for evidence of that. Failures are like: Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.186 sec <<< FAILURE! testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.187 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<\x00\x00\x006> but was:<\x00\x00\x004> at org.apache.hadoop.hbase.HBaseTestCase.assertEquals(HBaseTestCase.java:685) at org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting(TestHRegion.java:2711)
          Hide
          stack added a comment -

          Committed to branch and Andrew's suggestion.

          Show
          stack added a comment - Committed to branch and Andrew's suggestion.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/)

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/ )
          Hide
          stack added a comment -

          Committed to TRUNK. Thanks for the patch Dhruba.

          Show
          stack added a comment - Committed to TRUNK. Thanks for the patch Dhruba.
          Hide
          dhruba borthakur added a comment -

          A minor improvement from the previous patch
          1. invoke seek only if the getNext()call returned null and numIterReseek is zero.

          Show
          dhruba borthakur added a comment - A minor improvement from the previous patch 1. invoke seek only if the getNext()call returned null and numIterReseek is zero.
          Hide
          dhruba borthakur added a comment -

          I have a cf with three columns. I insert 100K puts on a single rowkey with random column values. all these kvs are cached in the memstore. A single scan to retrieve the values for two specified columnname takes about 4 millisecons of CPU without this patch and about 0.025 millieseconds of cpu with this patch. Unfortunately, I cannot share the test program because it is intertwined with quite a few non-opensource-able code.

          However, I think the patch will benefit all cases because it is better to do a O(logn) lookup rather then even a small number of sequential scans.

          Show
          dhruba borthakur added a comment - I have a cf with three columns. I insert 100K puts on a single rowkey with random column values. all these kvs are cached in the memstore. A single scan to retrieve the values for two specified columnname takes about 4 millisecons of CPU without this patch and about 0.025 millieseconds of cpu with this patch. Unfortunately, I cannot share the test program because it is intertwined with quite a few non-opensource-able code. However, I think the patch will benefit all cases because it is better to do a O(logn) lookup rather then even a small number of sequential scans.
          Hide
          stack added a comment -

          Patch looks good Dhruba. Any evidence of it being a general improvement that you could throw in here? Thanks.

          Show
          stack added a comment - Patch looks good Dhruba. Any evidence of it being a general improvement that you could throw in here? Thanks.
          Hide
          dhruba borthakur added a comment -

          This patch changes a reseek to a seek if the number of kvs that it has already skipped over is larger than a configured number (default 32).

          Show
          dhruba borthakur added a comment - This patch changes a reseek to a seek if the number of kvs that it has already skipped over is larger than a configured number (default 32).
          Hide
          Andrew Purtell added a comment -

          @dhruba I was talking with jgray. You might have a stop-gap that could go into 0.92?

          Show
          Andrew Purtell added a comment - @dhruba I was talking with jgray. You might have a stop-gap that could go into 0.92?
          Hide
          dhruba borthakur added a comment -

          Thanks Mikhail for your comments.

          let's keep this jira focussed only on memstore improvements. if we can do better while seeking on disk, can we open a new JIRA for that one (and link it to this one).

          My use-case for this jira is for the situation when the entire scan request can be satisfied from the memstore.

          Show
          dhruba borthakur added a comment - Thanks Mikhail for your comments. let's keep this jira focussed only on memstore improvements. if we can do better while seeking on disk, can we open a new JIRA for that one (and link it to this one). My use-case for this jira is for the situation when the entire scan request can be satisfied from the memstore.
          Hide
          Andrew Purtell added a comment -

          Marking as blocker for next major release.

          Show
          Andrew Purtell added a comment - Marking as blocker for next major release.
          Hide
          Todd Lipcon added a comment -

          You guys are writing a new HFile format? It would be nice to see a design doc and JIRA.

          Show
          Todd Lipcon added a comment - You guys are writing a new HFile format? It would be nice to see a design doc and JIRA.
          Hide
          Mikhail Bautin added a comment -

          When writing the scanner code for the new HFile format I thought about getting rid of linear scans by adding an array of key/value pair offsets at the beginning of each data block, so that seek/reseek can be implemented as a binary search. If we do have cases when the seek operation has to scan thousands of kvs, then I will certainly implement the binary search seek in the HFile V2 format.

          Show
          Mikhail Bautin added a comment - When writing the scanner code for the new HFile format I thought about getting rid of linear scans by adding an array of key/value pair offsets at the beginning of each data block, so that seek/reseek can be implemented as a binary search. If we do have cases when the seek operation has to scan thousands of kvs, then I will certainly implement the binary search seek in the HFile V2 format.

            People

            • Assignee:
              Unassigned
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development