HBase
  1. HBase
  2. HBASE-3725

HBase increments from old value after delete and write to disk

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.90.1
    • Fix Version/s: 0.92.3
    • Component/s: io, regionserver
    • Labels:
      None

      Description

      Deleted row values are sometimes used for starting points on new increments.

      To reproduce:
      Create a row "r". Set column "x" to some default value.
      Force hbase to write that value to the file system (such as restarting the cluster).
      Delete the row.
      Call table.incrementColumnValue with "some_value"
      Get the row.
      The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to "some_value".

      Code to reproduce:

      
      import java.io.IOException;
      import org.apache.hadoop.conf.Configuration;
      import org.apache.hadoop.hbase.HBaseConfiguration;
      import org.apache.hadoop.hbase.HColumnDescriptor;
      import org.apache.hadoop.hbase.HTableDescriptor;
      import org.apache.hadoop.hbase.client.Delete;
      import org.apache.hadoop.hbase.client.Get;
      import org.apache.hadoop.hbase.client.HBaseAdmin;
      import org.apache.hadoop.hbase.client.HTableInterface;
      import org.apache.hadoop.hbase.client.HTablePool;
      import org.apache.hadoop.hbase.client.Increment;
      import org.apache.hadoop.hbase.client.Result;
      import org.apache.hadoop.hbase.util.Bytes;
      
      public class HBaseTestIncrement
      {
      	static String tableName  = "testIncrement";
      
      	static byte[] infoCF = Bytes.toBytes("info");
      
      	static byte[] rowKey = Bytes.toBytes("test-rowKey");
      
      	static byte[] newInc = Bytes.toBytes("new");
      	static byte[] oldInc = Bytes.toBytes("old");
      
      	/**
      	 * This code reproduces a bug with increment column values in hbase
      	 * Usage: First run part one by passing '1' as the first arg
      	 *        Then restart the hbase cluster so it writes everything to disk
      	 *	  Run part two by passing '2' as the first arg
      	 *
      	 * This will result in the old deleted data being found and used for the increment calls
      	 *
      	 * @param args
      	 * @throws IOException
      	 */
      	public static void main(String[] args) throws IOException
      	{
      		if("1".equals(args[0]))
      			partOne();
      		if("2".equals(args[0]))
      			partTwo();
      		if ("both".equals(args[0]))
      		{
      			partOne();
      			partTwo();
      		}
      	}
      
      	/**
      	 * Creates a table and increments a column value 10 times by 10 each time.
      	 * Results in a value of 100 for the column
      	 *
      	 * @throws IOException
      	 */
      	static void partOne()throws IOException
      	{
      
      		Configuration conf = HBaseConfiguration.create();
      
      
      		HBaseAdmin admin = new HBaseAdmin(conf);
      		HTableDescriptor tableDesc = new HTableDescriptor(tableName);
      		tableDesc.addFamily(new HColumnDescriptor(infoCF));
      		if(admin.tableExists(tableName))
      		{
      			admin.disableTable(tableName);
      			admin.deleteTable(tableName);
      		}
      		admin.createTable(tableDesc);
      
      		HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
      		HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
      
      		//Increment unitialized column
      		for (int j = 0; j < 10; j++)
      		{
      			table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10);
      			Increment inc = new Increment(rowKey);
      			inc.addColumn(infoCF, newInc, (long)10);
      			table.increment(inc);
      		}
      
      		Get get = new Get(rowKey);
      		Result r = table.get(get);
      		System.out.println("initial values: new " + Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + Bytes.toLong(r.getValue(infoCF, oldInc)));
      
      	}
      
      	/**
      	 * First deletes the data then increments the column 10 times by 1 each time
      	 *
      	 * Should result in a value of 10 but it doesn't, it results in a values of 110
      	 *
      	 * @throws IOException
      	 */
      	static void partTwo()throws IOException
      	{
      		Configuration conf = HBaseConfiguration.create();
      
      		HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
      		HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
      		
      		Delete delete = new Delete(rowKey);
      		table.delete(delete);
      
      
      		//Increment columns
      		for (int j = 0; j < 10; j++)
      		{
      			table.incrementColumnValue(rowKey, infoCF, oldInc, (long)1);
      			Increment inc = new Increment(rowKey);
      			inc.addColumn(infoCF, newInc, (long)1);
      			table.increment(inc);
      		}
      
      
      		Get get = new Get(rowKey);
      		Result r = table.get(get);
      		System.out.println("after delete values: new " + Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + Bytes.toLong(r.getValue(infoCF, oldInc)));
      
      	}
      }
      
      1. HBASE-3725-0.92-V6.patch
        5 kB
        ShiXing
      2. HBASE-3725-0.92-V5.patch
        4 kB
        ShiXing
      3. HBASE-3725-0.92-V4.patch
        7 kB
        ShiXing
      4. HBASE-3725-0.92-V3.patch
        7 kB
        ShiXing
      5. HBASE-3725-0.92-V2.patch
        7 kB
        ShiXing
      6. HBASE-3725-0.92-V1.patch
        7 kB
        ShiXing
      7. HBASE-3725-v3.patch
        4 kB
        Jonathan Gray
      8. HBASE-3725-Test-v1.patch
        2 kB
        Jonathan Gray
      9. HBASE-3725.patch
        2 kB
        Nathaniel Cook

        Activity

        Hide
        Lars Hofhansl added a comment -

        @Ted: This is fixed in 0.94+ (including trunk)
        (0.94+ has lazy-seek, which made the special handling for increments - which causes the issue at hand - obsolete).

        Show
        Lars Hofhansl added a comment - @Ted: This is fixed in 0.94+ (including trunk) (0.94+ has lazy-seek, which made the special handling for increments - which causes the issue at hand - obsolete).
        Hide
        Ted Yu added a comment -

        We need patches for all branches before integration.

        Show
        Ted Yu added a comment - We need patches for all branches before integration.
        Hide
        Ted Yu added a comment -

        In trunk, getLastIncrement() call has been replaced with:

                  List<KeyValue> results = get(get, false);
        
        Show
        Ted Yu added a comment - In trunk, getLastIncrement() call has been replaced with: List<KeyValue> results = get(get, false );
        Hide
        Ted Yu added a comment -

        How about renaming leftResults as remainingResults ?

        Please prepare patch for trunk.

        Thanks

        Show
        Ted Yu added a comment - How about renaming leftResults as remainingResults ? Please prepare patch for trunk. Thanks
        Hide
        ShiXing added a comment -

        toTed

        TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

        Because the iscan is also read from memstore after I remove the code:

        List<KeyValue> fileResults = new ArrayList<KeyValue>();
        - iscan.checkOnlyStoreFiles();
        scanner = null;
        try {
        scanner = getScanner(iscan);
        

        And there is no result in memstore, so increment will treat it as 0, it has the same effect as delete.

        I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6.

        Show
        ShiXing added a comment - toTed TestHRegion#testIncrementWithFlushAndDelete passed without that assignment. Because the iscan is also read from memstore after I remove the code: List<KeyValue> fileResults = new ArrayList<KeyValue>(); - iscan.checkOnlyStoreFiles(); scanner = null ; try { scanner = getScanner(iscan); And there is no result in memstore, so increment will treat it as 0, it has the same effect as delete. I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6.
        Hide
        ShiXing added a comment -

        @Ted, the reassignment is because there is no interface to set the iscan back to both memstore and filestore, because at the begining, the iscan is set memstore

            // memstore scan
            iscan.checkOnlyMemStore();
        
        Show
        ShiXing added a comment - @Ted, the reassignment is because there is no interface to set the iscan back to both memstore and filestore, because at the begining, the iscan is set memstore // memstore scan iscan.checkOnlyMemStore();
        Hide
        Ted Yu added a comment -

        Looking at existing code:

          private List<KeyValue> getLastIncrement(final Get get) throws IOException {
            InternalScan iscan = new InternalScan(get);
        

        iscan was assigned at the beginning. Looks like the assignment in else block is redundant.

        TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

        Show
        Ted Yu added a comment - Looking at existing code: private List<KeyValue> getLastIncrement( final Get get) throws IOException { InternalScan iscan = new InternalScan(get); iscan was assigned at the beginning. Looks like the assignment in else block is redundant. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.
        Hide
        ShiXing added a comment -

        @Ted

        I generate a region with 3 store files. The increment slows from 1810 tps to 1020 tps, it slows 43.6%, .

        The tps is increment the same rowkey.

        The performance depends on how frequently the memstore flushed to the store file. If I also do the same test case, the latest patch's performance is same as the orig, because the increment rowkey is always in the memstore, and we do not need to read the store file.

        The difference is only for the rowKey that can't get the value from memstore, it need do a more read from memstore , compared to the 0.92 trunk: read only from store file.

        You must know, the orig's high performance is just benefit by only read from the memstore.

        Show
        ShiXing added a comment - @Ted I generate a region with 3 store files. The increment slows from 1810 tps to 1020 tps, it slows 43.6%, . The tps is increment the same rowkey. The performance depends on how frequently the memstore flushed to the store file. If I also do the same test case, the latest patch's performance is same as the orig, because the increment rowkey is always in the memstore, and we do not need to read the store file. The difference is only for the rowKey that can't get the value from memstore, it need do a more read from memstore , compared to the 0.92 trunk: read only from store file. You must know, the orig's high performance is just benefit by only read from the memstore.
        Hide
        Ted Yu added a comment -

        @ShiXing:
        Can you give us performance number based on your latest patch ?

        Thanks

        Show
        Ted Yu added a comment - @ShiXing: Can you give us performance number based on your latest patch ? Thanks
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12537128/HBASE-3725-0.92-V5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2408//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537128/HBASE-3725-0.92-V5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2408//console This message is automatically generated.
        Hide
        ShiXing added a comment -

        In V5, if the increment does not got the data it will got the data from store file and memstore, compared to the orig implement(just from store file), I think it will not slow the increment, it will only takes more time for the memstore.

        Show
        ShiXing added a comment - In V5, if the increment does not got the data it will got the data from store file and memstore, compared to the orig implement(just from store file), I think it will not slow the increment, it will only takes more time for the memstore.
        Hide
        ShiXing added a comment -

        @stack, I generate a region with 3 store files. The increment slows from 1810 tps to 1020 tps, it slows 43.6%, .

        I think we can fix the correctness by recreate the InternalScan when we can't get the data from memstore, we will still get the data from memstore for the delete not just from store files.

        Show
        ShiXing added a comment - @stack, I generate a region with 3 store files. The increment slows from 1810 tps to 1020 tps, it slows 43.6%, . I think we can fix the correctness by recreate the InternalScan when we can't get the data from memstore, we will still get the data from memstore for the delete not just from store files.
        Hide
        stack added a comment -

        @ShiXing I'm reluctant to commit this to 0.92 if it slows increments radically though it is 'correct'. While its correct, I think it rare folks will be doing deletes on increments. Do you have any idea how much it slows increments? Thanks.

        Show
        stack added a comment - @ShiXing I'm reluctant to commit this to 0.92 if it slows increments radically though it is 'correct'. While its correct, I think it rare folks will be doing deletes on increments. Do you have any idea how much it slows increments? Thanks.
        Hide
        Lars Hofhansl added a comment -

        I am not disagreeing. Just a note of caution. We waited with fixing this in 0.94 until we had lazy seek.

        Show
        Lars Hofhansl added a comment - I am not disagreeing. Just a note of caution. We waited with fixing this in 0.94 until we had lazy seek.
        Hide
        ShiXing added a comment -

        Mean the performance is important than the correctness?
        I think "Correctness should always be first".

        Show
        ShiXing added a comment - Mean the performance is important than the correctness? I think "Correctness should always be first".
        Hide
        Lars Hofhansl added a comment -

        0.92 does not have the lazy seek optimizations, so the performance implication of this would be significant.

        Show
        Lars Hofhansl added a comment - 0.92 does not have the lazy seek optimizations, so the performance implication of this would be significant.
        Hide
        Lars Hofhansl added a comment -

        HBASE-3443 fixes this for 0.94 and 0.96.

        Show
        Lars Hofhansl added a comment - HBASE-3443 fixes this for 0.94 and 0.96.
        Hide
        ShiXing added a comment -

        The trunk does not have this bug, for the HRegion#getLastIncrement() has been discarded already.

        Show
        ShiXing added a comment - The trunk does not have this bug, for the HRegion#getLastIncrement() has been discarded already.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12535627/HBASE-3725-0.92-V4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2351//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535627/HBASE-3725-0.92-V4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2351//console This message is automatically generated.
        Hide
        ShiXing added a comment -

        pls review it.

        Show
        ShiXing added a comment - pls review it.
        Hide
        Ted Yu added a comment -

        Without the patch, new test failed:

        Failed tests:   testIncrementWithFlushAndDelete(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:<10> but was:<20>
        

        The test passed with patch.

        +1 on this correctness fix.

        @Xing:
        Can you prepare patch for trunk and run it through Hadoop QA ?

        Thanks

        Show
        Ted Yu added a comment - Without the patch, new test failed: Failed tests: testIncrementWithFlushAndDelete(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:<10> but was:<20> The test passed with patch. +1 on this correctness fix. @Xing: Can you prepare patch for trunk and run it through Hadoop QA ? Thanks
        Hide
        ShiXing added a comment -

        But the correctness is affected.

        I mean the 0.92 trunk code for increment will make data wrong for some case. Because once the increment row's memstore is flushed, if there is no Delete, it will be no affected, if there is a Delete, the data will be wrong undoubtly.

        Show
        ShiXing added a comment - But the correctness is affected. I mean the 0.92 trunk code for increment will make data wrong for some case. Because once the increment row's memstore is flushed, if there is no Delete, it will be no affected, if there is a Delete, the data will be wrong undoubtly.
        Hide
        ShiXing added a comment -

        Is this a fix for 0.92 only ShiXing?

        yes

        Doing a get instead of a getLastIncrement is good because we'll handle deletes. Does it slow down increments doing a get rather than getLastIncrement?

        I think it will slow down increments a little. I have not test the performance. But the correctness is affected.
        In test case, if there is a put , a flush, and a delete, the increment will read out the data in the StoreFile by flushing , because the getLastIncrement will read the memstore first, and return empty because there is no row just one delete, and then read the StoreFile without merge with delete. You can find the bug by my test case or described as Description of this issue.

        junit.framework.AssertionFailedError: expected:<10> but was:<20>
        	at junit.framework.Assert.fail(Assert.java:50)
        	at junit.framework.Assert.failNotEquals(Assert.java:287)
        	at junit.framework.Assert.assertEquals(Assert.java:67)
        	at junit.framework.Assert.assertEquals(Assert.java:134)
        	at junit.framework.Assert.assertEquals(Assert.java:140)
        	at org.apache.hadoop.hbase.regionserver.TestHRegion.testIncrementWithFlushAndDelete(TestHRegion.java:3446)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at junit.framework.TestCase.runTest(TestCase.java:168)
        	at junit.framework.TestCase.runBare(TestCase.java:134)
        	at junit.framework.TestResult$1.protect(TestResult.java:110)
        	at junit.framework.TestResult.runProtected(TestResult.java:128)
        	at junit.framework.TestResult.run(TestResult.java:113)
        	at junit.framework.TestCase.run(TestCase.java:124)
        	at junit.framework.TestSuite.runTest(TestSuite.java:243)
        	at junit.framework.TestSuite.run(TestSuite.java:238)
        	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
        	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
        	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
        	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
        	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
        	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
        	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
        
        Show
        ShiXing added a comment - Is this a fix for 0.92 only ShiXing? yes Doing a get instead of a getLastIncrement is good because we'll handle deletes. Does it slow down increments doing a get rather than getLastIncrement? I think it will slow down increments a little. I have not test the performance. But the correctness is affected. In test case, if there is a put , a flush, and a delete, the increment will read out the data in the StoreFile by flushing , because the getLastIncrement will read the memstore first, and return empty because there is no row just one delete, and then read the StoreFile without merge with delete. You can find the bug by my test case or described as Description of this issue. junit.framework.AssertionFailedError: expected:<10> but was:<20> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:134) at junit.framework.Assert.assertEquals(Assert.java:140) at org.apache.hadoop.hbase.regionserver.TestHRegion.testIncrementWithFlushAndDelete(TestHRegion.java:3446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
        Hide
        stack added a comment -

        Is this a fix for 0.92 only ShiXing?

        Doing a get instead of a getLastIncrement is good because we'll handle deletes. Does it slow down increments doing a get rather than getLastIncrement?

        Show
        stack added a comment - Is this a fix for 0.92 only ShiXing? Doing a get instead of a getLastIncrement is good because we'll handle deletes. Does it slow down increments doing a get rather than getLastIncrement?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12533945/HBASE-3725-0.92-V3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2293//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533945/HBASE-3725-0.92-V3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2293//console This message is automatically generated.
        Hide
        ShiXing added a comment -

        If the Delete and next Increment occured the same millionsecond, the Increment will not take effect for followed Increment, because the Delete will lay over the Increment.

        Show
        ShiXing added a comment - If the Delete and next Increment occured the same millionsecond, the Increment will not take effect for followed Increment, because the Delete will lay over the Increment.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12533939/HBASE-3725-0.92-V2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2292//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533939/HBASE-3725-0.92-V2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2292//console This message is automatically generated.
        Hide
        ShiXing added a comment -

        change the method name.

        Show
        ShiXing added a comment - change the method name.
        Hide
        ShiXing added a comment -

        remove the getLastIncrement function.

        Show
        ShiXing added a comment - remove the getLastIncrement function.
        Hide
        stack added a comment -

        @ShiXing Any chance of a patch. Thanks.

        Show
        stack added a comment - @ShiXing Any chance of a patch. Thanks.
        Hide
        ShiXing added a comment -

        Sorry for mistake of Ctrl+Enter.

        I think the fixup could just change the calls of getLastIncrement() to get(), I see that in 0.94 the code is alreay remove the getLastIncrement() function.

        Show
        ShiXing added a comment - Sorry for mistake of Ctrl+Enter. I think the fixup could just change the calls of getLastIncrement() to get(), I see that in 0.94 the code is alreay remove the getLastIncrement() function.
        Hide
        ShiXing added a comment -

        Is there any progress?

        Show
        ShiXing added a comment - Is there any progress?
        Hide
        Vaibhav Puranik added a comment -

        We encountered the exact same issue in @GumGum production environment. We are running 0.90.5. I hope this gets fixed ASAP.

        Show
        Vaibhav Puranik added a comment - We encountered the exact same issue in @GumGum production environment. We are running 0.90.5. I hope this gets fixed ASAP.
        Hide
        stack added a comment -

        Moving out of 0.92.0. Pull it back in if you think different.

        Show
        stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
        Hide
        Jonathan Gray added a comment -

        Please review.

        Show
        Jonathan Gray added a comment - Please review.
        Hide
        Jonathan Gray added a comment -

        This fixes the problem in the only simple way I could think of.

        A new configuration option is added "hbase.hregion.increment.supportdeletes" which defaults to true (because it is required for "correctness").

        When this option is true, then when the scan against StoreFiles is done, it will also include the MemStore. This should ensure correctness for cases where delete markers are present in the MemStore that need to apply to KVs in the StoreFiles.

        I made this a configuration option because it makes increment operations less optimal, so for increment workloads that do not need to support deletes, they can keep the option turned off and avoid the double scan of the MemStore.

        A potential optimal and correct solution to this could be to use the old Get delete tracker which would retain delete information across files (for in-order file processing rather than one mega merge). Some work is going into re-integrating those, so if they do make it back in the HBase, we could utilize them here.

        This should suffice for now.

        Show
        Jonathan Gray added a comment - This fixes the problem in the only simple way I could think of. A new configuration option is added "hbase.hregion.increment.supportdeletes" which defaults to true (because it is required for "correctness"). When this option is true, then when the scan against StoreFiles is done, it will also include the MemStore. This should ensure correctness for cases where delete markers are present in the MemStore that need to apply to KVs in the StoreFiles. I made this a configuration option because it makes increment operations less optimal, so for increment workloads that do not need to support deletes, they can keep the option turned off and avoid the double scan of the MemStore. A potential optimal and correct solution to this could be to use the old Get delete tracker which would retain delete information across files (for in-order file processing rather than one mega merge). Some work is going into re-integrating those, so if they do make it back in the HBase, we could utilize them here. This should suffice for now.
        Hide
        Jonathan Gray added a comment -

        Here is a small unit test which replicates the broken behavior using both the old incrementColumnValue as well as the new Increment (the bug exists in both).

        The fix for this is complicated. The issue is that we check the MemStore first and then we check the StoreFiles. Currently no information read from the MemStore is carried into the query of the StoreFiles.

        I'll keep working on this since it is a correctness issue, hopefully the solution won't be super messy.

        Thanks again for finding this Nathaniel!

        Show
        Jonathan Gray added a comment - Here is a small unit test which replicates the broken behavior using both the old incrementColumnValue as well as the new Increment (the bug exists in both). The fix for this is complicated. The issue is that we check the MemStore first and then we check the StoreFiles. Currently no information read from the MemStore is carried into the query of the StoreFiles. I'll keep working on this since it is a correctness issue, hopefully the solution won't be super messy. Thanks again for finding this Nathaniel!
        Hide
        Jonathan Gray added a comment -

        Hey Nathaniel. Thanks for posting the unit test!

        I will take a look at this sometime this week and try to get a fix out for it.

        Show
        Jonathan Gray added a comment - Hey Nathaniel. Thanks for posting the unit test! I will take a look at this sometime this week and try to get a fix out for it.
        Hide
        Nathaniel Cook added a comment -

        Here is a patch with a simple unit test for this bug.

        As far as a fix for the bug any help would be appreciated. I don't know enough about HBase's internals to even know where to begin.

        Show
        Nathaniel Cook added a comment - Here is a patch with a simple unit test for this bug. As far as a fix for the bug any help would be appreciated. I don't know enough about HBase's internals to even know where to begin.
        Hide
        stack added a comment -

        Good one Nathaniel. Any chance of a patch, preferrably one that includes your nice test above?

        Show
        stack added a comment - Good one Nathaniel. Any chance of a patch, preferrably one that includes your nice test above?

          People

          • Assignee:
            ShiXing
            Reporter:
            Nathaniel Cook
          • Votes:
            3 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

            • Created:
              Updated:

              Development