[HBASE-29] HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Invalid
Affects Version/s: 0.1.2, 0.2.0
Fix Version/s: None
Component/s: Client, regionserver
Labels:
None

Description

Ok, this one is a little tricky. Let's say that you write a row with some value without a timestamp, thus meaning right now. Then, the memcache gets flushed out to a MapFile. Then, you write another value to the same row, this time with a timestamp that is in the past, ie, before the "now" timestamp of the first put.

Some time later, but before there is a compaction, if you do a get for this row, and only ask for a single version, you will logically be expecting the latest version of the cell, which you would assume would be the one written at "now" time. Instead, you will get the value written into the "past" cell, because even though it is tagged as having happened in the past, it actually was written after the "now" cell, and thus when #get searches for satisfying values, it runs into the one most recently written first.

The result of this problem is inconsistent data results. Note that this problem only ever exists when there's an uncompacted HStore, because during compaction, these cells will all get sorted into the correct order by timestamp and such. In a way, this actually makes the problem worse, because then you could easily get inconsistent results from HBase about the same (unchanged) row depending on whether there's been a flush/compaction.

The only solution I can think of for this problem at the moment is to scan all the MapFiles and Memcache for possible results, sort them, and then select the desired number of versions off of the top. This is unfortunate because it means you never get the snazzy shortcircuit logic except within a single mapfile or memcache.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

29.patch
04/Feb/08 19:08
18 kB
Bryan Duxbury

Issue Links

is blocked by

HBASE-2265 HFile and Memstore should maintain minimum and maximum timestamps

Closed

relates to

HBASE-532 Odd interaction between HRegion.get, HRegion.deleteAll and compactions

Closed

Activity

Ascending order - Click to sort in descending order

Bryan Duxbury added a comment - 15/Jan/08 05:00

This patch includes some new tests in TestGet2 to highlight this problem and fixes in HStore to take care of the issues.

Bryan Duxbury added a comment - 15/Jan/08 05:00 This patch includes some new tests in TestGet2 to highlight this problem and fixes in HStore to take care of the issues.

Bryan Duxbury added a comment - 15/Jan/08 19:07

This patch passes all the tests locally.

Bryan Duxbury added a comment - 15/Jan/08 19:07 This patch passes all the tests locally.

Bryan Duxbury added a comment - 15/Jan/08 19:07

Trying hudson.

Bryan Duxbury added a comment - 15/Jan/08 19:07 Trying hudson.

Michael Stack added a comment - 15/Jan/08 20:24

Looking at the finalKey implementation, it looks like you think 'top' is the first half of the file, the half w/ the lowest keys, rather than the half w/ the higher keys.

Michael Stack added a comment - 15/Jan/08 20:24 Looking at the finalKey implementation, it looks like you think 'top' is the first half of the file, the half w/ the lowest keys, rather than the half w/ the higher keys.

Hadoop QA added a comment - 16/Jan/08 09:21

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch
against trunk revision r612314.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/console

This message is automatically generated.

Hadoop QA added a comment - 16/Jan/08 09:21 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch against trunk revision r612314. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/console This message is automatically generated.

Michael Stack added a comment - 16/Jan/08 16:29

Retrying against hudson. Its unlikey this patch caused the TTMR failure – looks to be problem going to hdfs – but just in case...

Michael Stack added a comment - 16/Jan/08 16:29 Retrying against hudson. Its unlikey this patch caused the TTMR failure – looks to be problem going to hdfs – but just in case...

Michael Stack added a comment - 16/Jan/08 16:29

Retrying hudson.

Michael Stack added a comment - 16/Jan/08 16:29 Retrying hudson.

Hadoop QA added a comment - 16/Jan/08 23:07

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch
against trunk revision r612561.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/console

This message is automatically generated.

Hadoop QA added a comment - 16/Jan/08 23:07 +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch against trunk revision r612561. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/console This message is automatically generated.

Michael Stack added a comment - 17/Jan/08 00:20

Patch looks good but its going to kill performance so I want to run basic PE test before committing just to see how much its going to cost us.

Chatting w/ Bryan, scanners likely have same issue. This patch doesn't address that.

Michael Stack added a comment - 17/Jan/08 00:20 Patch looks good but its going to kill performance so I want to run basic PE test before committing just to see how much its going to cost us. Chatting w/ Bryan, scanners likely have same issue. This patch doesn't address that.

Michael Stack added a comment - 19/Jan/08 20:51

I ran the PE test. Random reads rate is halved with this patch in place. Lets discuss. Of note, the patch has rotted; no longer applies against TRUNK.

Michael Stack added a comment - 19/Jan/08 20:51 I ran the PE test. Random reads rate is halved with this patch in place. Lets discuss. Of note, the patch has rotted; no longer applies against TRUNK.

Bryan Duxbury added a comment - 19/Jan/08 21:42

I understand that the performance degrades as a result of this patch, but I'm much more worried about the correctness of the data than the performance. Sure, it might make it faster to leave this out, but the first time someone happens to hit this problem, they'll go out of their minds trying to track it down.

I think we can make some improvements that will make this patch a little faster. I will take another look and try to freshen up the patch.

Bryan Duxbury added a comment - 19/Jan/08 21:42 I understand that the performance degrades as a result of this patch, but I'm much more worried about the correctness of the data than the performance. Sure, it might make it faster to leave this out, but the first time someone happens to hit this problem, they'll go out of their minds trying to track it down. I think we can make some improvements that will make this patch a little faster. I will take another look and try to freshen up the patch.

Bryan Duxbury added a comment - 21/Jan/08 19:47

Putting this off until later, as the performance hit is pretty substantial, and I think there might be a solution that gives us both performance and correctness.

Bryan Duxbury added a comment - 21/Jan/08 19:47 Putting this off until later, as the performance hit is pretty substantial, and I think there might be a solution that gives us both performance and correctness.

Bryan Duxbury added a comment - 04/Feb/08 19:08

Updated for new svn.

Bryan Duxbury added a comment - 04/Feb/08 19:08 Updated for new svn.

Bryan Duxbury added a comment - 06/Feb/08 19:46

We're not going to use the latest patch, seeing as how the performance suffers so greatly, so I'm cancelling it.

Bryan Duxbury added a comment - 06/Feb/08 19:46 We're not going to use the latest patch, seeing as how the performance suffers so greatly, so I'm cancelling it.

Bryan Duxbury added a comment - 17/Mar/08 19:41

This issue might actually also be contributing to the trouble we're seeing in ~~HBASE-514~~. If the memcache can give an answer, it will, and it will be taken as a definitive answer for the entire row, which isn't guaranteed to be the case, especially when regions are being reassigned and such.

Bryan Duxbury added a comment - 17/Mar/08 19:41 This issue might actually also be contributing to the trouble we're seeing in HBASE-514 . If the memcache can give an answer, it will, and it will be taken as a definitive answer for the entire row, which isn't guaranteed to be the case, especially when regions are being reassigned and such.

Bryan Duxbury added a comment - 24/Mar/08 04:22

I think we should make it a priority to get this fixed. Even if it performs worse, it's really unacceptable to give incorrect answers.

However, I think there's a decent alternative to just getting slower wholesale. When we fixed getClosestBefore, we decided that the assumption would always be that getClosestRowBefore had to operate on a table where cells were always being added in ascending timestamp order, which at least made it perform acceptably. Clearly, this issue is about situations where that assumption isn't true. So, what I think we should do is make the default get and getRow methods return the answer that assumes the mapfiles don't have any inherent ordering, and then make new methods getAscending and getRowAscending (names could change?) that assume mapfiles are sorted ascending, and are faster as a result.

With this approach, people can make the default choice of using get and getRow, pay the performance penalty, but get the right answer no matter what. Then, if people happen to have a use case that matches the always-ascending constraints, then they can just switch the method call fractionally and get the improved performance.

Bryan Duxbury added a comment - 24/Mar/08 04:22 I think we should make it a priority to get this fixed. Even if it performs worse, it's really unacceptable to give incorrect answers. However, I think there's a decent alternative to just getting slower wholesale. When we fixed getClosestBefore, we decided that the assumption would always be that getClosestRowBefore had to operate on a table where cells were always being added in ascending timestamp order, which at least made it perform acceptably. Clearly, this issue is about situations where that assumption isn't true. So, what I think we should do is make the default get and getRow methods return the answer that assumes the mapfiles don't have any inherent ordering, and then make new methods getAscending and getRowAscending (names could change?) that assume mapfiles are sorted ascending, and are faster as a result. With this approach, people can make the default choice of using get and getRow, pay the performance penalty, but get the right answer no matter what. Then, if people happen to have a use case that matches the always-ascending constraints, then they can just switch the method call fractionally and get the improved performance.

Michael Stack added a comment - 18/Apr/08 16:12

Adding methods that return right answer when versions are not inserted in ascending order sounds good to me but would suggest that the default implementations do as they do now; we'd add documentation that explicitly states how they work pointing at the new methods for applications to use if insert versions are not in ascending order.

Michael Stack added a comment - 18/Apr/08 16:12 Adding methods that return right answer when versions are not inserted in ascending order sounds good to me but would suggest that the default implementations do as they do now; we'd add documentation that explicitly states how they work pointing at the new methods for applications to use if insert versions are not in ascending order.

Jim Kellerman added a comment - 03/Jun/08 18:48

Does this need to get fixed for 0.2.0?

As Hypertable doesn't allow out of order timestamps, and in the general case, timestamps are assigned automatically (and hence are always increasing), it doesn't seem that important.

Jim Kellerman added a comment - 03/Jun/08 18:48 Does this need to get fixed for 0.2.0? As Hypertable doesn't allow out of order timestamps, and in the general case, timestamps are assigned automatically (and hence are always increasing), it doesn't seem that important.

Bryan Duxbury added a comment - 03/Jun/08 19:06

I tend to agree at this point. Should we make it so you can't put values at non-increasing timestamps at all?

Bryan Duxbury added a comment - 03/Jun/08 19:06 I tend to agree at this point. Should we make it so you can't put values at non-increasing timestamps at all?

Michael Stack added a comment - 03/Jun/08 19:11

I'd say move it out of 0.2 but keep it alive at a low priority. Its going to come up again.

Michael Stack added a comment - 03/Jun/08 19:11 I'd say move it out of 0.2 but keep it alive at a low priority. Its going to come up again.

Billy Pearson added a comment - 31/Aug/08 06:45

I been doing some updates in compaction's and found the above issue that Bryan is talking about in compaction's also.

The way we compact now we assume that the newest mapfile/HStoreFiles will have newer timestamps then the versions in the older mapfiles.

So currently we are keeping the x newest cells /wo considering timestamps.
The rule is newest mapfiles get read first which could push out values that have newer timestamps

If we decide to change where max versions keep the newest values based on timestamps compaction's will also need a update so we do not lose the correct versions.

Billy Pearson added a comment - 31/Aug/08 06:45 I been doing some updates in compaction's and found the above issue that Bryan is talking about in compaction's also. The way we compact now we assume that the newest mapfile/HStoreFiles will have newer timestamps then the versions in the older mapfiles. So currently we are keeping the x newest cells /wo considering timestamps. The rule is newest mapfiles get read first which could push out values that have newer timestamps If we decide to change where max versions keep the newest values based on timestamps compaction's will also need a update so we do not lose the correct versions.

Michael Stack added a comment - 16/Dec/08 19:27

I'm thinking each edit needs to carry two timestamps to solve this issue, the user designated one and another that records actual insert time. The latter is used during major compactions figuring cell TTL and its used distingushing two edits of same r/c/t so we don't overwrite older vintage edits.

Michael Stack added a comment - 16/Dec/08 19:27 I'm thinking each edit needs to carry two timestamps to solve this issue, the user designated one and another that records actual insert time. The latter is used during major compactions figuring cell TTL and its used distingushing two edits of same r/c/t so we don't overwrite older vintage edits.

Pranav Khaitan added a comment - 07/Sep/10 06:13

I think the code for Store has changed since then and this issue doesn't exist anymore. In that case, we should consider closing this jira

Pranav Khaitan added a comment - 07/Sep/10 06:13 I think the code for Store has changed since then and this issue doesn't exist anymore. In that case, we should consider closing this jira

Michael Stack added a comment - 07/Sep/10 22:25

Agreed Pranav. We don't have a getFull any more and this mostly works since we moved get to use scanners always. There are issue lurking but we can open specific to address any found rather than operate under this old/stale description.

Michael Stack added a comment - 07/Sep/10 22:25 Agreed Pranav. We don't have a getFull any more and this mostly works since we moved get to use scanners always. There are issue lurking but we can open specific to address any found rather than operate under this old/stale description.

People

Assignee:: Unassigned

Reporter:: Bryan Duxbury

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Jan/08 01:08

Updated:: 11/Jun/22 19:47

Resolved:: 07/Sep/10 22:25