Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-29

HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 0.1.2, 0.2.0
    • None
    • Client, regionserver
    • None

    Description

      Ok, this one is a little tricky. Let's say that you write a row with some value without a timestamp, thus meaning right now. Then, the memcache gets flushed out to a MapFile. Then, you write another value to the same row, this time with a timestamp that is in the past, ie, before the "now" timestamp of the first put.

      Some time later, but before there is a compaction, if you do a get for this row, and only ask for a single version, you will logically be expecting the latest version of the cell, which you would assume would be the one written at "now" time. Instead, you will get the value written into the "past" cell, because even though it is tagged as having happened in the past, it actually was written after the "now" cell, and thus when #get searches for satisfying values, it runs into the one most recently written first.

      The result of this problem is inconsistent data results. Note that this problem only ever exists when there's an uncompacted HStore, because during compaction, these cells will all get sorted into the correct order by timestamp and such. In a way, this actually makes the problem worse, because then you could easily get inconsistent results from HBase about the same (unchanged) row depending on whether there's been a flush/compaction.

      The only solution I can think of for this problem at the moment is to scan all the MapFiles and Memcache for possible results, sort them, and then select the desired number of versions off of the top. This is unfortunate because it means you never get the snazzy shortcircuit logic except within a single mapfile or memcache.

      Attachments

        1. 29.patch
          18 kB
          Bryan Duxbury

        Issue Links

          Activity

            bryanduxbury Bryan Duxbury added a comment -

            This patch includes some new tests in TestGet2 to highlight this problem and fixes in HStore to take care of the issues.

            bryanduxbury Bryan Duxbury added a comment - This patch includes some new tests in TestGet2 to highlight this problem and fixes in HStore to take care of the issues.
            bryanduxbury Bryan Duxbury added a comment -

            This patch passes all the tests locally.

            bryanduxbury Bryan Duxbury added a comment - This patch passes all the tests locally.
            bryanduxbury Bryan Duxbury added a comment -

            Trying hudson.

            bryanduxbury Bryan Duxbury added a comment - Trying hudson.
            stack Michael Stack added a comment -

            Looking at the finalKey implementation, it looks like you think 'top' is the first half of the file, the half w/ the lowest keys, rather than the half w/ the higher keys.

            stack Michael Stack added a comment - Looking at the finalKey implementation, it looks like you think 'top' is the first half of the file, the half w/ the lowest keys, rather than the half w/ the higher keys.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch
            against trunk revision r612314.

            @author +1. The patch does not contain any @author tags.

            javadoc +1. The javadoc tool did not generate any warning messages.

            javac +1. The applied patch does not generate any new compiler warnings.

            findbugs +1. The patch does not introduce any new Findbugs warnings.

            core tests +1. The patch passed core unit tests.

            contrib tests -1. The patch failed contrib unit tests.

            Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/testReport/
            Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
            Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/checkstyle-errors.html
            Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch against trunk revision r612314. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1607/console This message is automatically generated.
            stack Michael Stack added a comment -

            Retrying against hudson. Its unlikey this patch caused the TTMR failure – looks to be problem going to hdfs – but just in case...

            stack Michael Stack added a comment - Retrying against hudson. Its unlikey this patch caused the TTMR failure – looks to be problem going to hdfs – but just in case...
            stack Michael Stack added a comment -

            Retrying hudson.

            stack Michael Stack added a comment - Retrying hudson.
            hadoopqa Hadoop QA added a comment -

            +1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch
            against trunk revision r612561.

            @author +1. The patch does not contain any @author tags.

            javadoc +1. The javadoc tool did not generate any warning messages.

            javac +1. The applied patch does not generate any new compiler warnings.

            findbugs +1. The patch does not introduce any new Findbugs warnings.

            core tests +1. The patch passed core unit tests.

            contrib tests +1. The patch passed contrib unit tests.

            Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/testReport/
            Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
            Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/checkstyle-errors.html
            Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373187/2512-v2.patch against trunk revision r612561. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1615/console This message is automatically generated.
            stack Michael Stack added a comment -

            Patch looks good but its going to kill performance so I want to run basic PE test before committing just to see how much its going to cost us.

            Chatting w/ Bryan, scanners likely have same issue. This patch doesn't address that.

            stack Michael Stack added a comment - Patch looks good but its going to kill performance so I want to run basic PE test before committing just to see how much its going to cost us. Chatting w/ Bryan, scanners likely have same issue. This patch doesn't address that.
            stack Michael Stack added a comment -

            I ran the PE test. Random reads rate is halved with this patch in place. Lets discuss. Of note, the patch has rotted; no longer applies against TRUNK.

            stack Michael Stack added a comment - I ran the PE test. Random reads rate is halved with this patch in place. Lets discuss. Of note, the patch has rotted; no longer applies against TRUNK.
            bryanduxbury Bryan Duxbury added a comment -

            I understand that the performance degrades as a result of this patch, but I'm much more worried about the correctness of the data than the performance. Sure, it might make it faster to leave this out, but the first time someone happens to hit this problem, they'll go out of their minds trying to track it down.

            I think we can make some improvements that will make this patch a little faster. I will take another look and try to freshen up the patch.

            bryanduxbury Bryan Duxbury added a comment - I understand that the performance degrades as a result of this patch, but I'm much more worried about the correctness of the data than the performance. Sure, it might make it faster to leave this out, but the first time someone happens to hit this problem, they'll go out of their minds trying to track it down. I think we can make some improvements that will make this patch a little faster. I will take another look and try to freshen up the patch.
            bryanduxbury Bryan Duxbury added a comment -

            Putting this off until later, as the performance hit is pretty substantial, and I think there might be a solution that gives us both performance and correctness.

            bryanduxbury Bryan Duxbury added a comment - Putting this off until later, as the performance hit is pretty substantial, and I think there might be a solution that gives us both performance and correctness.
            bryanduxbury Bryan Duxbury added a comment -

            Updated for new svn.

            bryanduxbury Bryan Duxbury added a comment - Updated for new svn.
            bryanduxbury Bryan Duxbury added a comment -

            We're not going to use the latest patch, seeing as how the performance suffers so greatly, so I'm cancelling it.

            bryanduxbury Bryan Duxbury added a comment - We're not going to use the latest patch, seeing as how the performance suffers so greatly, so I'm cancelling it.
            bryanduxbury Bryan Duxbury added a comment -

            This issue might actually also be contributing to the trouble we're seeing in HBASE-514. If the memcache can give an answer, it will, and it will be taken as a definitive answer for the entire row, which isn't guaranteed to be the case, especially when regions are being reassigned and such.

            bryanduxbury Bryan Duxbury added a comment - This issue might actually also be contributing to the trouble we're seeing in HBASE-514 . If the memcache can give an answer, it will, and it will be taken as a definitive answer for the entire row, which isn't guaranteed to be the case, especially when regions are being reassigned and such.
            bryanduxbury Bryan Duxbury added a comment -

            I think we should make it a priority to get this fixed. Even if it performs worse, it's really unacceptable to give incorrect answers.

            However, I think there's a decent alternative to just getting slower wholesale. When we fixed getClosestBefore, we decided that the assumption would always be that getClosestRowBefore had to operate on a table where cells were always being added in ascending timestamp order, which at least made it perform acceptably. Clearly, this issue is about situations where that assumption isn't true. So, what I think we should do is make the default get and getRow methods return the answer that assumes the mapfiles don't have any inherent ordering, and then make new methods getAscending and getRowAscending (names could change?) that assume mapfiles are sorted ascending, and are faster as a result.

            With this approach, people can make the default choice of using get and getRow, pay the performance penalty, but get the right answer no matter what. Then, if people happen to have a use case that matches the always-ascending constraints, then they can just switch the method call fractionally and get the improved performance.

            bryanduxbury Bryan Duxbury added a comment - I think we should make it a priority to get this fixed. Even if it performs worse, it's really unacceptable to give incorrect answers. However, I think there's a decent alternative to just getting slower wholesale. When we fixed getClosestBefore, we decided that the assumption would always be that getClosestRowBefore had to operate on a table where cells were always being added in ascending timestamp order, which at least made it perform acceptably. Clearly, this issue is about situations where that assumption isn't true. So, what I think we should do is make the default get and getRow methods return the answer that assumes the mapfiles don't have any inherent ordering, and then make new methods getAscending and getRowAscending (names could change?) that assume mapfiles are sorted ascending, and are faster as a result. With this approach, people can make the default choice of using get and getRow, pay the performance penalty, but get the right answer no matter what. Then, if people happen to have a use case that matches the always-ascending constraints, then they can just switch the method call fractionally and get the improved performance.
            stack Michael Stack added a comment -

            Adding methods that return right answer when versions are not inserted in ascending order sounds good to me but would suggest that the default implementations do as they do now; we'd add documentation that explicitly states how they work pointing at the new methods for applications to use if insert versions are not in ascending order.

            stack Michael Stack added a comment - Adding methods that return right answer when versions are not inserted in ascending order sounds good to me but would suggest that the default implementations do as they do now; we'd add documentation that explicitly states how they work pointing at the new methods for applications to use if insert versions are not in ascending order.
            jimk Jim Kellerman added a comment -

            Does this need to get fixed for 0.2.0?

            As Hypertable doesn't allow out of order timestamps, and in the general case, timestamps are assigned automatically (and hence are always increasing), it doesn't seem that important.

            jimk Jim Kellerman added a comment - Does this need to get fixed for 0.2.0? As Hypertable doesn't allow out of order timestamps, and in the general case, timestamps are assigned automatically (and hence are always increasing), it doesn't seem that important.
            bryanduxbury Bryan Duxbury added a comment -

            I tend to agree at this point. Should we make it so you can't put values at non-increasing timestamps at all?

            bryanduxbury Bryan Duxbury added a comment - I tend to agree at this point. Should we make it so you can't put values at non-increasing timestamps at all?
            stack Michael Stack added a comment -

            I'd say move it out of 0.2 but keep it alive at a low priority. Its going to come up again.

            stack Michael Stack added a comment - I'd say move it out of 0.2 but keep it alive at a low priority. Its going to come up again.
            viper799 Billy Pearson added a comment -

            I been doing some updates in compaction's and found the above issue that Bryan is talking about in compaction's also.

            The way we compact now we assume that the newest mapfile/HStoreFiles will have newer timestamps then the versions in the older mapfiles.

            So currently we are keeping the x newest cells /wo considering timestamps.
            The rule is newest mapfiles get read first which could push out values that have newer timestamps

            If we decide to change where max versions keep the newest values based on timestamps compaction's will also need a update so we do not lose the correct versions.

            viper799 Billy Pearson added a comment - I been doing some updates in compaction's and found the above issue that Bryan is talking about in compaction's also. The way we compact now we assume that the newest mapfile/HStoreFiles will have newer timestamps then the versions in the older mapfiles. So currently we are keeping the x newest cells /wo considering timestamps. The rule is newest mapfiles get read first which could push out values that have newer timestamps If we decide to change where max versions keep the newest values based on timestamps compaction's will also need a update so we do not lose the correct versions.
            stack Michael Stack added a comment -

            I'm thinking each edit needs to carry two timestamps to solve this issue, the user designated one and another that records actual insert time. The latter is used during major compactions figuring cell TTL and its used distingushing two edits of same r/c/t so we don't overwrite older vintage edits.

            stack Michael Stack added a comment - I'm thinking each edit needs to carry two timestamps to solve this issue, the user designated one and another that records actual insert time. The latter is used during major compactions figuring cell TTL and its used distingushing two edits of same r/c/t so we don't overwrite older vintage edits.
            pranavkhaitan Pranav Khaitan added a comment -

            I think the code for Store has changed since then and this issue doesn't exist anymore. In that case, we should consider closing this jira

            pranavkhaitan Pranav Khaitan added a comment - I think the code for Store has changed since then and this issue doesn't exist anymore. In that case, we should consider closing this jira
            stack Michael Stack added a comment -

            Agreed Pranav. We don't have a getFull any more and this mostly works since we moved get to use scanners always. There are issue lurking but we can open specific to address any found rather than operate under this old/stale description.

            stack Michael Stack added a comment - Agreed Pranav. We don't have a getFull any more and this mostly works since we moved get to use scanners always. There are issue lurking but we can open specific to address any found rather than operate under this old/stale description.

            People

              Unassigned Unassigned
              bryanduxbury Bryan Duxbury
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: