HBase
  1. HBase
  2. HBASE-2670

MemStore should retain multiple KVs with the same timestamp when memstoreTS differs

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.5, 0.90.0
    • Fix Version/s: 0.90.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There appears to be a bug in HBASE-2248 as committed to trunk. See following failing test:
      http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1296/testReport/junit/org.apache.hadoop.hbase/TestAcidGuarantees/testAtomicity/
      Think this is the same bug we saw early on in 2248 in the 0.20 branch, looks like the fix didn't make it over.

      1. hbase-2670.txt
        21 kB
        Todd Lipcon
      2. hbase-2670.txt
        7 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          On 2010-06-15 18:06:09, Ryan Rawson wrote:

          > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1306

          > <http://review.hbase.org/r/180/diff/1/?file=1345#file1345line1306>

          >

          > the extra KVs will make it out to hfile, but considering the problem I just don't see a better solution right now. Perhaps collapsing extra version during the flush, but then the question is which extra versions?

          >

          > Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory, but I hope not.

          .bq ...but then the question is which extra versions?

          For now, I'd think we'd keep most recently written.

          .bq Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory

          Maybe. When we tease out deletes and how ts's are meant to work then we'll learn whether or not they are needed I'd imagine.

          • stack

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/180/#review235
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: stack@duboce.net On 2010-06-15 18:06:09, Ryan Rawson wrote: > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1306 > < http://review.hbase.org/r/180/diff/1/?file=1345#file1345line1306 > > > the extra KVs will make it out to hfile, but considering the problem I just don't see a better solution right now. Perhaps collapsing extra version during the flush, but then the question is which extra versions? > > Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory, but I hope not. .bq ...but then the question is which extra versions? For now, I'd think we'd keep most recently written. .bq Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory Maybe. When we tease out deletes and how ts's are meant to work then we'll learn whether or not they are needed I'd imagine. stack ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/180/#review235 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/180/#review235
          -----------------------------------------------------------

          Ship it!

          ok looks good. I'm also prepping a new way to do ICVs in a different issue (HBASE-2501).

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <http://review.hbase.org/r/180/#comment1066>

          the extra KVs will make it out to hfile, but considering the problem I just don't see a better solution right now. Perhaps collapsing extra version during the flush, but then the question is which extra versions?

          Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory, but I hope not.

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/180/#review235 ----------------------------------------------------------- Ship it! ok looks good. I'm also prepping a new way to do ICVs in a different issue ( HBASE-2501 ). src/main/java/org/apache/hadoop/hbase/KeyValue.java < http://review.hbase.org/r/180/#comment1066 > the extra KVs will make it out to hfile, but considering the problem I just don't see a better solution right now. Perhaps collapsing extra version during the flush, but then the question is which extra versions? Maybe this will lead us to using an extra timestamp in every KeyValue on disk and in memory, but I hope not. Ryan
          Hide
          Todd Lipcon added a comment -

          Committed to trunk.

          Show
          Todd Lipcon added a comment - Committed to trunk.
          Hide
          HBase Review Board added a comment -

          Message from: "Todd Lipcon" <todd@cloudera.com>

          On 2010-06-15 15:03:40, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 558

          > <http://review.hbase.org/r/180/diff/1/?file=1345#file1345line558>

          >

          > Is this going to be correct always? Cloning, we don't want the src memstoreTS?

          yea, this clone is necessary so that incrementColumnValue() doesn't blow up to create a new memstore entry for every increment. We could put it in that code, but I think it's cleaner to just be part of clone. I think it's best in clone so that a foo.compareTo(foo.clone()) == 0 as an invariant.

          On 2010-06-15 15:03:40, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1306

          > <http://review.hbase.org/r/180/diff/1/?file=1345#file1345line1306>

          >

          > What we going to do about N versions all of same r/f/q/ts but of different memstoreTS? We're not going to suppress them just yet? We're going to punt till hbase-1485? The multiple versions make it out to store files too?

          Yea, leaving that off since the problem already exists and it's a bit of a heavy change.

          On 2010-06-15 15:03:40, stack wrote:

          > src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java, line 140

          > <http://review.hbase.org/r/180/diff/1/?file=1348#file1348line140>

          >

          > Prefix w/ 'TODO' to make this work-to-do more findable.

          ah, I mean that we don't need to verify at this point because the writer hasn't actually written any rows yet! will clarify

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/180/#review229
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Todd Lipcon" <todd@cloudera.com> On 2010-06-15 15:03:40, stack wrote: > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 558 > < http://review.hbase.org/r/180/diff/1/?file=1345#file1345line558 > > > Is this going to be correct always? Cloning, we don't want the src memstoreTS? yea, this clone is necessary so that incrementColumnValue() doesn't blow up to create a new memstore entry for every increment. We could put it in that code, but I think it's cleaner to just be part of clone. I think it's best in clone so that a foo.compareTo(foo.clone()) == 0 as an invariant. On 2010-06-15 15:03:40, stack wrote: > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1306 > < http://review.hbase.org/r/180/diff/1/?file=1345#file1345line1306 > > > What we going to do about N versions all of same r/f/q/ts but of different memstoreTS? We're not going to suppress them just yet? We're going to punt till hbase-1485? The multiple versions make it out to store files too? Yea, leaving that off since the problem already exists and it's a bit of a heavy change. On 2010-06-15 15:03:40, stack wrote: > src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java, line 140 > < http://review.hbase.org/r/180/diff/1/?file=1348#file1348line140 > > > Prefix w/ 'TODO' to make this work-to-do more findable. ah, I mean that we don't need to verify at this point because the writer hasn't actually written any rows yet! will clarify Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/180/#review229 -----------------------------------------------------------
          Hide
          Todd Lipcon added a comment -

          New patch addresses comments from stack's review - it was just clarification of some comments in the code, so I'll go ahead and commit. Thanks Stack.

          Show
          Todd Lipcon added a comment - New patch addresses comments from stack's review - it was just clarification of some comments in the code, so I'll go ahead and commit. Thanks Stack.
          Hide
          HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/180/#review229
          -----------------------------------------------------------

          Ship it!

          +1

          I some minors in the below and then some questions but these should not get in way of a commit.

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <http://review.hbase.org/r/180/#comment1016>

          Is this going to be correct always? Cloning, we don't want the src memstoreTS?

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <http://review.hbase.org/r/180/#comment1017>

          What we going to do about N versions all of same r/f/q/ts but of different memstoreTS? We're not going to suppress them just yet? We're going to punt till hbase-1485? The multiple versions make it out to store files too?

          src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java
          <http://review.hbase.org/r/180/#comment1018>

          Prefix w/ 'TODO' to make this work-to-do more findable.

          src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java
          <http://review.hbase.org/r/180/#comment1019>

          Nice

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java
          <http://review.hbase.org/r/180/#comment1020>

          Go ahead and remove then boss.

          • stack
          Show
          HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/180/#review229 ----------------------------------------------------------- Ship it! +1 I some minors in the below and then some questions but these should not get in way of a commit. src/main/java/org/apache/hadoop/hbase/KeyValue.java < http://review.hbase.org/r/180/#comment1016 > Is this going to be correct always? Cloning, we don't want the src memstoreTS? src/main/java/org/apache/hadoop/hbase/KeyValue.java < http://review.hbase.org/r/180/#comment1017 > What we going to do about N versions all of same r/f/q/ts but of different memstoreTS? We're not going to suppress them just yet? We're going to punt till hbase-1485? The multiple versions make it out to store files too? src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java < http://review.hbase.org/r/180/#comment1018 > Prefix w/ 'TODO' to make this work-to-do more findable. src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java < http://review.hbase.org/r/180/#comment1019 > Nice src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java < http://review.hbase.org/r/180/#comment1020 > Go ahead and remove then boss. stack
          Hide
          HBase Review Board added a comment -

          Message from: "Todd Lipcon" <todd@cloudera.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/180/
          -----------------------------------------------------------

          Review request for hbase and Ryan Rawson.

          Summary
          -------

          This changes the memstore comparator, improves the atomicity tests, and also fixes ICV to continue to have the right behavior even with the comparator change.

          This addresses bug hbase-2670.
          http://issues.apache.org/jira/browse/hbase-2670

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 71284cf
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 2a0dcee
          src/test/java/org/apache/hadoop/hbase/MultithreadedTestUtil.java 7c062d7
          src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java 75f3c8b
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9833d76
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java 0566af7

          Diff: http://review.hbase.org/r/180/diff

          Testing
          -------

          Unit tests.

          Thanks,

          Todd

          Show
          HBase Review Board added a comment - Message from: "Todd Lipcon" <todd@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/180/ ----------------------------------------------------------- Review request for hbase and Ryan Rawson. Summary ------- This changes the memstore comparator, improves the atomicity tests, and also fixes ICV to continue to have the right behavior even with the comparator change. This addresses bug hbase-2670. http://issues.apache.org/jira/browse/hbase-2670 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 71284cf src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 2a0dcee src/test/java/org/apache/hadoop/hbase/MultithreadedTestUtil.java 7c062d7 src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java 75f3c8b src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9833d76 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java 0566af7 Diff: http://review.hbase.org/r/180/diff Testing ------- Unit tests. Thanks, Todd
          Hide
          Todd Lipcon added a comment -

          Changing name of this JIRA to reflect the specific issue I'm solving with the patch. Will open a second JIRA for the other atomicity issue regarding reseeks.

          Show
          Todd Lipcon added a comment - Changing name of this JIRA to reflect the specific issue I'm solving with the patch. Will open a second JIRA for the other atomicity issue regarding reseeks.
          Hide
          stack added a comment -

          .bq Can we discuss the client side dependencies issue over in HBASE-2714?

          Yes

          Show
          stack added a comment - .bq Can we discuss the client side dependencies issue over in HBASE-2714 ? Yes
          Hide
          Todd Lipcon added a comment -

          ah, crap, I forgot that. Can we discuss the client side dependencies issue over in HBASE-2714? (this isn't a regression, we already use guava in a couple other places in the client).

          Show
          Todd Lipcon added a comment - ah, crap, I forgot that. Can we discuss the client side dependencies issue over in HBASE-2714 ? (this isn't a regression, we already use guava in a couple other places in the client).
          Hide
          ryan rawson added a comment -

          unfortunately 'KeyValue' is part of the the Client API.

          Show
          ryan rawson added a comment - unfortunately 'KeyValue' is part of the the Client API.
          Hide
          Todd Lipcon added a comment -

          extra KeyValues in the flush that would have previously been combined

          I thought we determined this was fine, since we already have potential duplicate keyvalue issues between separate HFiles, and this isn't any better/worse?

          lets not add new client dependencies

          I don't think this patch changes anything client side.

          Will work on the ICV thing this afternoon.

          Show
          Todd Lipcon added a comment - extra KeyValues in the flush that would have previously been combined I thought we determined this was fine, since we already have potential duplicate keyvalue issues between separate HFiles, and this isn't any better/worse? lets not add new client dependencies I don't think this patch changes anything client side. Will work on the ICV thing this afternoon.
          Hide
          ryan rawson added a comment -

          as per our discussion there are several unresolved issues with this patch:

          • apparently ICVs (how? why? where?)
          • extra KeyValues in the flush that would have previously been combined
          • lets not add new client dependencies
          Show
          ryan rawson added a comment - as per our discussion there are several unresolved issues with this patch: apparently ICVs (how? why? where?) extra KeyValues in the flush that would have previously been combined lets not add new client dependencies
          Hide
          Todd Lipcon added a comment -

          K, makes sense, I'll update the patch to ensure that the ICVs maintain equal memstoreTS (it seems in the current system they're getting double-inserted).

          Show
          Todd Lipcon added a comment - K, makes sense, I'll update the patch to ensure that the ICVs maintain equal memstoreTS (it seems in the current system they're getting double-inserted).
          Hide
          ryan rawson added a comment -

          ICVs dont take part in this new concurrency system. They are not actually being 'updated in place' but KeyValues are being 'overwritten'. Given there is only 1 row/column involved in an ICV, there is no multi-KeyValue coherency to worry about here. ICVs use memstoreTS==0 even (and are always included in every scan).

          In the future I think it might be possible to insert a new KV then delete the old one out of memstore... that shouldn't cause any issues with scanners, but we can do that later.

          Show
          ryan rawson added a comment - ICVs dont take part in this new concurrency system. They are not actually being 'updated in place' but KeyValues are being 'overwritten'. Given there is only 1 row/column involved in an ICV, there is no multi-KeyValue coherency to worry about here. ICVs use memstoreTS==0 even (and are always included in every scan). In the future I think it might be possible to insert a new KV then delete the old one out of memstore... that shouldn't cause any issues with scanners, but we can do that later.
          Hide
          Todd Lipcon added a comment -

          This patch causes issues with the incrementColumnValues() tests, which rely on updating a KV in-place in the memstore.

          Not sure what we should do here... on one hand it would be nice for incrementColumnValue to have proper semantics, on the other hand, we don't want to insert a new KV for every increment. Ryan, any thoughts?

          Show
          Todd Lipcon added a comment - This patch causes issues with the incrementColumnValues() tests, which rely on updating a KV in-place in the memstore. Not sure what we should do here... on one hand it would be nice for incrementColumnValue to have proper semantics, on the other hand, we don't want to insert a new KV for every increment. Ryan, any thoughts?
          Hide
          Todd Lipcon added a comment -

          It turns out that the majority of this issue was that the comparator used by MemStore didn't take into account the "memstore logical timestamp" (memstoreTS). Thus, a new writer could overwrite older entries if it did a write in the same millisecond. A concurrent reader would then see a partial row because the new entries would be invisible, thus "revealing" cells with a lower "real" TS.

          This patch adds two tests to TestMemStore that check for the correct behavior.

          I believe there is still a separate issue with multi-row scans and updateReaders(), but I'll open a separate JIRA with separate tests for that one.

          Show
          Todd Lipcon added a comment - It turns out that the majority of this issue was that the comparator used by MemStore didn't take into account the "memstore logical timestamp" (memstoreTS). Thus, a new writer could overwrite older entries if it did a write in the same millisecond. A concurrent reader would then see a partial row because the new entries would be invisible, thus "revealing" cells with a lower "real" TS. This patch adds two tests to TestMemStore that check for the correct behavior. I believe there is still a separate issue with multi-row scans and updateReaders(), but I'll open a separate JIRA with separate tests for that one.
          Hide
          ryan rawson added a comment -

          On the subject of getScanner() failing, that is how it used to work
          before some of the 2248 related fixes went it. I don't think we want
          to work too hard to move the IOExceptions around, you are going to
          have to be able to handle the exception at any time.

          Show
          ryan rawson added a comment - On the subject of getScanner() failing, that is how it used to work before some of the 2248 related fixes went it. I don't think we want to work too hard to move the IOExceptions around, you are going to have to be able to handle the exception at any time.
          Hide
          Todd Lipcon added a comment -

          Yep, this is in branch too. Anyone mind if I commit the TestAcidGuarantees case to the 20 branch?
          Going to continue to work on a fix and a more targeted test case that fails even on laptops.

          Show
          Todd Lipcon added a comment - Yep, this is in branch too. Anyone mind if I commit the TestAcidGuarantees case to the 20 branch? Going to continue to work on a fix and a more targeted test case that fails even on laptops.
          Hide
          Todd Lipcon added a comment -

          We're not guaranteed to be between rows in trunk due to intra-row scanning, so we need a special flag to tell us when we should be between rows.

          I'm right now running TestAcidGuarantees against 0.20 to see if this exists in 0.20.5 rcs.

          Show
          Todd Lipcon added a comment - We're not guaranteed to be between rows in trunk due to intra-row scanning, so we need a special flag to tell us when we should be between rows. I'm right now running TestAcidGuarantees against 0.20 to see if this exists in 0.20.5 rcs.
          Hide
          ryan rawson added a comment -

          In This case the fix should be to reseek to firstonrow of the peeked row. We
          are guarenteed to be between rows so this works.

          https://issues.apache.org/jira/browse/HBASE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876897#action_12876897]
          in the scanner by way of peeking before reseeking. However, the peek takes
          into account the old read point, so we get a situation where our peek sees
          the old version of a column that has multiple versions, and we seek there.
          Then when we seek forward from there, we see the rows with newer timestamps
          because we're no longer restricted by the old read point.
          following failing test:
          http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1296/testReport/junit/org.apache.hadoop.hbase/TestAcidGuarantees/testAtomicity/
          looks like the fix didn't make it over.

          Show
          ryan rawson added a comment - In This case the fix should be to reseek to firstonrow of the peeked row. We are guarenteed to be between rows so this works. https://issues.apache.org/jira/browse/HBASE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876897#action_12876897 ] in the scanner by way of peeking before reseeking. However, the peek takes into account the old read point, so we get a situation where our peek sees the old version of a column that has multiple versions, and we seek there. Then when we seek forward from there, we see the rows with newer timestamps because we're no longer restricted by the old read point. following failing test: http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1296/testReport/junit/org.apache.hadoop.hbase/TestAcidGuarantees/testAtomicity/ looks like the fix didn't make it over.
          Hide
          Todd Lipcon added a comment -

          I think I'm understanding the issue here. Here's my theory:

          When we reseek, we reseek to the keyvalue that was previously the "next" in the scanner by way of peeking before reseeking. However, the peek takes into account the old read point, so we get a situation where our peek sees the old version of a column that has multiple versions, and we seek there. Then when we seek forward from there, we see the rows with newer timestamps because we're no longer restricted by the old read point.

          I'm working on a test case and patch.

          Show
          Todd Lipcon added a comment - I think I'm understanding the issue here. Here's my theory: When we reseek, we reseek to the keyvalue that was previously the "next" in the scanner by way of peeking before reseeking. However, the peek takes into account the old read point, so we get a situation where our peek sees the old version of a column that has multiple versions, and we seek there. Then when we seek forward from there, we see the rows with newer timestamps because we're no longer restricted by the old read point. I'm working on a test case and patch.
          Hide
          Todd Lipcon added a comment -

          Still not fixed after your recent patch, plus it broke two other tests in trunk. Specifically, getScanner() on the client now throws IOExceptions eagerly instead of on the first .next() call, so TestFSErrorsExposed failed. It's an easy test fix, but it seems strange to me that getScanner() could throw IOE before you even start to iterate on it?

          Show
          Todd Lipcon added a comment - Still not fixed after your recent patch, plus it broke two other tests in trunk. Specifically, getScanner() on the client now throws IOExceptions eagerly instead of on the first .next() call, so TestFSErrorsExposed failed. It's an easy test fix, but it seems strange to me that getScanner() could throw IOE before you even start to iterate on it?
          Show
          Todd Lipcon added a comment - This isn't fixed in trunk, see: http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1308/ Specifically: http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1308/testReport/org.apache.hadoop.hbase/TestAcidGuarantees/testAtomicity/
          Hide
          ryan rawson added a comment -

          #1 is the old memstore scanner - it would get an 'entire row' at a time and then chomp on it's internal array. This made memstore scans really slow. The whole index hbase saga.

          #2 workish, and is what is implemented in HBASE-2616 (which is now in branch and trunk).

          The problem with #2, and you will see it in HRegionScanner is we no longer update the read point "between rows". Since by the time we figure out we are on the next row, we've already peek()ed the value off the scanner, we cant switch to a different read point, or else you will get the first KeyValue from one read point, and the rest from another (sounds familiar?)

          As for why did I commit and not await review, this is because the hudson instance at hudson.hbase.org is one of the few machines that trips against this failure frequently. I haven't been able to repro on my dev environment, ever. And now we have 11 successful branch builds of 0.20, thus indicating that our bug is now fixed.

          Show
          ryan rawson added a comment - #1 is the old memstore scanner - it would get an 'entire row' at a time and then chomp on it's internal array. This made memstore scans really slow. The whole index hbase saga. #2 workish, and is what is implemented in HBASE-2616 (which is now in branch and trunk). The problem with #2, and you will see it in HRegionScanner is we no longer update the read point "between rows". Since by the time we figure out we are on the next row, we've already peek()ed the value off the scanner, we cant switch to a different read point, or else you will get the first KeyValue from one read point, and the rest from another (sounds familiar?) As for why did I commit and not await review, this is because the hudson instance at hudson.hbase.org is one of the few machines that trips against this failure frequently. I haven't been able to repro on my dev environment, ever. And now we have 11 successful branch builds of 0.20, thus indicating that our bug is now fixed.
          Hide
          Todd Lipcon added a comment -

          How about you post a patch for review instead of just committing? This stuff is bug prone, we should do code reviews.

          Show
          Todd Lipcon added a comment - How about you post a patch for review instead of just committing? This stuff is bug prone, we should do code reviews.
          Hide
          ryan rawson added a comment -

          I'm committing a fix to StoreScanner that will do essentially "#2" in your option list.

          Show
          ryan rawson added a comment - I'm committing a fix to StoreScanner that will do essentially "#2" in your option list.
          Hide
          Todd Lipcon added a comment -

          Actually, I misunderstood this a little bit. The above describes the issue when using intra-row scanning (see HBASE-2673). Without intra-row scanning, since updateReaders is synchronized, it should only be called when the StoreScanner is between rows.

          So I think the issue may be how updateReaders itself works. It uses peak() on the heap to find the next row it's going to, and then seeks to that one. updateReaders, though, is called by a different thread with a different readpoint set. So that seek pulls in values for the next row that are different from what will be read.

          I'll try to make a patch for this tomorrow.

          Show
          Todd Lipcon added a comment - Actually, I misunderstood this a little bit. The above describes the issue when using intra-row scanning (see HBASE-2673 ). Without intra-row scanning, since updateReaders is synchronized, it should only be called when the StoreScanner is between rows. So I think the issue may be how updateReaders itself works. It uses peak() on the heap to find the next row it's going to, and then seeks to that one. updateReaders, though, is called by a different thread with a different readpoint set. So that seek pulls in values for the next row that are different from what will be read. I'll try to make a patch for this tomorrow.
          Hide
          Todd Lipcon added a comment -

          The issue is that the memstore timestamps are lost once we flush the memstore to an HFile, and we immediately change over open scanners to scan from the HFile in updateReaders()

          I think the fix is one of the following:

          • in updateReaders, scan ahead the memstore reader until the next row, and cache those KVs internally. Then when we hit the end of the cache, do the actual reseek in the HFile at the begining of the next row.
          • in updateReaders, simply mark a flag that we need to update as soon as we hit the next row. Then do the reseek lazily in next()
          Show
          Todd Lipcon added a comment - The issue is that the memstore timestamps are lost once we flush the memstore to an HFile, and we immediately change over open scanners to scan from the HFile in updateReaders() I think the fix is one of the following: in updateReaders, scan ahead the memstore reader until the next row, and cache those KVs internally. Then when we hit the end of the cache, do the actual reseek in the HFile at the begining of the next row. in updateReaders, simply mark a flag that we need to update as soon as we hit the next row. Then do the reseek lazily in next()

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development