HBase
  1. HBase
  2. HBASE-4487

The increment operation can release the rowlock before sync-ing the Hlog

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. There is a fundamental change to the group-commit behaviour: it batches transactions in HBase code before pushing it down to the wal.
      Show
      The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. There is a fundamental change to the group-commit behaviour: it batches transactions in HBase code before pushing it down to the wal.

      Description

      This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server.

      1. 4487-v7.txt
        38 kB
        Ted Yu
      2. appendNoSync6.txt
        38 kB
        dhruba borthakur
      3. appendNoSync5.txt
        37 kB
        dhruba borthakur
      4. appendNoSync4.txt
        38 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows.

          Introuced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          Show
          dhruba borthakur added a comment - The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introuced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.
          Hide
          Ted Yu added a comment -

          Nice feature.

          +      System.err.println("Version Mismatch between client and server" +
          +                         "... command aborted.");
          

          I think LOG should be used.

          Incrementer thread should be set as Daemon thread.

          For HLog.unflushedEntries, since it is used to generate txid, I wonder if there is a better name for it.

          +    // method is pretty heavyweight as far a locking is concerned. The
          

          Should be "as far as ..."

          +      // write out all accumulated Entries to hdfs.
          +      for (Entry e : pending) {
          +        writer.append(e);
          +      }
          

          I think exception handling should be added here. What if writer.append() raises exception ? The remaining Entries would be lost ?

          Show
          Ted Yu added a comment - Nice feature. + System .err.println( "Version Mismatch between client and server" + + "... command aborted." ); I think LOG should be used. Incrementer thread should be set as Daemon thread. For HLog.unflushedEntries, since it is used to generate txid, I wonder if there is a better name for it. + // method is pretty heavyweight as far a locking is concerned. The Should be "as far as ..." + // write out all accumulated Entries to hdfs. + for (Entry e : pending) { + writer.append(e); + } I think exception handling should be added here. What if writer.append() raises exception ? The remaining Entries would be lost ?
          Hide
          dhruba borthakur added a comment -

          Addressed Ted Yu's review comments. The code that does

                for (Entry e : pending) {
          +        writer.append(e);
          +      }
          

          does not catch exceptions, instead throws an exception to the caller if any of the edits fail to make it to HDFS. In fact, Hbase regionserver exits if an HDFS write/sync fails, this is expected behaviour.

          Show
          dhruba borthakur added a comment - Addressed Ted Yu's review comments. The code that does for (Entry e : pending) { + writer.append(e); + } does not catch exceptions, instead throws an exception to the caller if any of the edits fail to make it to HDFS. In fact, Hbase regionserver exits if an HDFS write/sync fails, this is expected behaviour.
          Hide
          Ted Yu added a comment -

          Wow, this is the quickest turnaround for code review I have ever seen
          Normally you can wait for other people's comments before making the next patch.

          I still see System.err.println() call.
          For my last comment, thanks for reminding me the current behavior. What I meant was that your change introduced some kind of buffering which would delay the bubbling of IOException.
          I guess this is Okay. We should document this in release notes.

          Can I ask my favorite question ? Test suite.

          Also, using https://reviews.apache.org/ would be convenient.

          Show
          Ted Yu added a comment - Wow, this is the quickest turnaround for code review I have ever seen Normally you can wait for other people's comments before making the next patch. I still see System.err.println() call. For my last comment, thanks for reminding me the current behavior. What I meant was that your change introduced some kind of buffering which would delay the bubbling of IOException. I guess this is Okay. We should document this in release notes. Can I ask my favorite question ? Test suite. Also, using https://reviews.apache.org/ would be convenient.
          Hide
          Ted Yu added a comment -

          I got the following from test suite run:

          testAppend(org.apache.hadoop.hbase.regionserver.wal.TestHLog)  Time elapsed: 0.037 sec  <<< ERROR!
          java.lang.NullPointerException
                  at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppend(TestHLog.java:567)
          

          Please use my script on HBASE-4480 to reproduce the above.
          I can provide test output if needed.

          Show
          Ted Yu added a comment - I got the following from test suite run: testAppend(org.apache.hadoop.hbase.regionserver.wal.TestHLog) Time elapsed: 0.037 sec <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppend(TestHLog.java:567) Please use my script on HBASE-4480 to reproduce the above. I can provide test output if needed.
          Hide
          dhruba borthakur added a comment -

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          The one reference to System.err.println() is a printUsage() message that is needed only if u want to run the unit test as a standalone command line utility.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result.

          There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          Show
          dhruba borthakur added a comment - All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. The one reference to System.err.println() is a printUsage() message that is needed only if u want to run the unit test as a standalone command line utility. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2116/
          -----------------------------------------------------------

          Review request for hbase and Ted Yu.

          Summary
          -------

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          This addresses bug HBASE-4487.
          https://issues.apache.org/jira/browse/HBASE-4487

          Diffs


          /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401
          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401
          /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION
          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION
          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401
          /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401

          Diff: https://reviews.apache.org/r/2116/diff

          Testing
          -------

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          Thanks,

          Dhruba

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/ ----------------------------------------------------------- Review request for hbase and Ted Yu. Summary ------- The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. This addresses bug HBASE-4487 . https://issues.apache.org/jira/browse/HBASE-4487 Diffs /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401 /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401 /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401 Diff: https://reviews.apache.org/r/2116/diff Testing ------- All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. Thanks, Dhruba
          Hide
          dhruba borthakur added a comment -
          Show
          dhruba borthakur added a comment - Review board: https://reviews.apache.org/r/2116/
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2116/#review2180
          -----------------------------------------------------------

          Ship it!

          Patch v6 passed all unit tests.
          Nice job Dhruba.

          • Ted

          On 2011-09-29 19:27:36, Dhruba Borthakur wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2116/

          -----------------------------------------------------------

          (Updated 2011-09-29 19:27:36)

          Review request for hbase and Ted Yu.

          Summary

          -------

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          This addresses bug HBASE-4487.

          https://issues.apache.org/jira/browse/HBASE-4487

          Diffs

          -----

          /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401

          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401

          /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401

          /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401

          Diff: https://reviews.apache.org/r/2116/diff

          Testing

          -------

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          Thanks,

          Dhruba

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/#review2180 ----------------------------------------------------------- Ship it! Patch v6 passed all unit tests. Nice job Dhruba. Ted On 2011-09-29 19:27:36, Dhruba Borthakur wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/ ----------------------------------------------------------- (Updated 2011-09-29 19:27:36) Review request for hbase and Ted Yu. Summary ------- The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. This addresses bug HBASE-4487 . https://issues.apache.org/jira/browse/HBASE-4487 Diffs ----- /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401 /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401 /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401 Diff: https://reviews.apache.org/r/2116/diff Testing ------- All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. Thanks, Dhruba
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2116/#review2186
          -----------------------------------------------------------

          Looks good. I like the bench marking tool.

          This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.

          With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.

          • Michael

          On 2011-09-29 19:27:36, Dhruba Borthakur wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2116/

          -----------------------------------------------------------

          (Updated 2011-09-29 19:27:36)

          Review request for hbase and Ted Yu.

          Summary

          -------

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          This addresses bug HBASE-4487.

          https://issues.apache.org/jira/browse/HBASE-4487

          Diffs

          -----

          /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401

          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401

          /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401

          /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401

          Diff: https://reviews.apache.org/r/2116/diff

          Testing

          -------

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          Thanks,

          Dhruba

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/#review2186 ----------------------------------------------------------- Looks good. I like the bench marking tool. This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough. With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior. Michael On 2011-09-29 19:27:36, Dhruba Borthakur wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/ ----------------------------------------------------------- (Updated 2011-09-29 19:27:36) Review request for hbase and Ted Yu. Summary ------- The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. This addresses bug HBASE-4487 . https://issues.apache.org/jira/browse/HBASE-4487 Diffs ----- /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401 /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401 /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401 Diff: https://reviews.apache.org/r/2116/diff Testing ------- All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. Thanks, Dhruba
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-09-29 22:31:37, Michael Stack wrote:

          > Looks good. I like the bench marking tool.

          >

          > This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough.

          >

          > With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior.

          HRegion has:

            final Configuration conf;
          

          Can we pass new config parameter through conf which enables this new feature ?

          • Ted

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2116/#review2186
          -----------------------------------------------------------

          On 2011-09-29 19:27:36, Dhruba Borthakur wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2116/

          -----------------------------------------------------------

          (Updated 2011-09-29 19:27:36)

          Review request for hbase and Ted Yu.

          Summary

          -------

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          This addresses bug HBASE-4487.

          https://issues.apache.org/jira/browse/HBASE-4487

          Diffs

          -----

          /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401

          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401

          /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401

          /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401

          Diff: https://reviews.apache.org/r/2116/diff

          Testing

          -------

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          Thanks,

          Dhruba

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-09-29 22:31:37, Michael Stack wrote: > Looks good. I like the bench marking tool. > > This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. Does this group commit not 'group' enough. > > With this change, its no longer possible to sync each increment. We ok w/ that? We should at least release note this difference in increment behaviors and erring on the conservative side, I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks will know of this changed increment behavior. HRegion has: final Configuration conf; Can we pass new config parameter through conf which enables this new feature ? Ted ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/#review2186 ----------------------------------------------------------- On 2011-09-29 19:27:36, Dhruba Borthakur wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/ ----------------------------------------------------------- (Updated 2011-09-29 19:27:36) Review request for hbase and Ted Yu. Summary ------- The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. This addresses bug HBASE-4487 . https://issues.apache.org/jira/browse/HBASE-4487 Diffs ----- /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401 /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401 /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401 Diff: https://reviews.apache.org/r/2116/diff Testing ------- All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. Thanks, Dhruba
          Hide
          Lars Hofhansl added a comment -

          How does that relate to the logic in HLog.syncer, which explicitly syncs without holding the updateLock.
          If I read the patch correctly it adds new sync calls that are performed with the updateLock held (locking out all updates while the sync operation is in progress).

          Show
          Lars Hofhansl added a comment - How does that relate to the logic in HLog.syncer, which explicitly syncs without holding the updateLock. If I read the patch correctly it adds new sync calls that are performed with the updateLock held (locking out all updates while the sync operation is in progress).
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2116/#review2206
          -----------------------------------------------------------

          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2116/#comment5140>

          Am not sure. Just to clarify
          Is this check not needed here then?

          • ramkrishna

          On 2011-09-29 19:27:36, Dhruba Borthakur wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2116/

          -----------------------------------------------------------

          (Updated 2011-09-29 19:27:36)

          Review request for hbase and Ted Yu.

          Summary

          -------

          The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS.

          There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call.

          This addresses bug HBASE-4487.

          https://issues.apache.org/jira/browse/HBASE-4487

          Diffs

          -----

          /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401

          /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401

          /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION

          /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401

          /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401

          Diff: https://reviews.apache.org/r/2116/diff

          Testing

          -------

          All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch.

          Thanks,

          Dhruba

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/#review2206 ----------------------------------------------------------- /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2116/#comment5140 > Am not sure. Just to clarify Is this check not needed here then? ramkrishna On 2011-09-29 19:27:36, Dhruba Borthakur wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2116/ ----------------------------------------------------------- (Updated 2011-09-29 19:27:36) Review request for hbase and Ted Yu. Summary ------- The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introduced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. This addresses bug HBASE-4487 . https://issues.apache.org/jira/browse/HBASE-4487 Diffs ----- /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177401 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1177401 /src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java 1177401 /src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java 1177401 Diff: https://reviews.apache.org/r/2116/diff Testing ------- All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. Thanks, Dhruba
          Hide
          dhruba borthakur added a comment -

          > This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient.

          The DFSClient write code path is pretty heavyweight (as it stands now), as shown in the code DFSClient.writeChunk(). It acquires a synchronized section, creates packets, stuffs crcs in it, acquires locks for the outgoing queue and queues the packet. And all this is done while holding the HLog.updateLock, which means it blocks out any other writes to the HLog for this time. Instead, this patch queues the write-data completely inside the hbase code and the HLog.updateLock is released much much earlier.

          > If I read the patch correctly it adds new sync calls that are performed with the updateLock held

          There are no additional code paths that this patch introduced where a HLog.sync occurs within the HLog.updateLock. The only pre-existing place where a sync calls occurs within the HLog.updateLock is in HLog.cleanupCurrentWriter. This code path is invoked only during a logroll; and during a logroll all currently executing transactions are anyways quiesced, so I see no impact to concurrency in this code path. Please let me know if you disagree.

          > if (this.closed) : Is this check not needed here then?

          I think this code is still needed, this check was not introduced by this patch. Any reasons why you think it might not be needed?

          > Can we pass new config parameter through conf which enables this new feature ?

          I can surely make this configurable. But I am doubtful if anybody would care to run with this feature switched off (since it is a performance fix with potentially no downsides for any usecase). But if you feel strongly about it, I will make it configurable. However, most of the common code in HLog will not be inside this configurable parameter, only the change in HRegion.increment() can be switched on/off via this new config. Please let me know.

          Show
          dhruba borthakur added a comment - > This is like a group commit in HLog. We have a group committing going on inside in sync down in dfsclient. The DFSClient write code path is pretty heavyweight (as it stands now), as shown in the code DFSClient.writeChunk(). It acquires a synchronized section, creates packets, stuffs crcs in it, acquires locks for the outgoing queue and queues the packet. And all this is done while holding the HLog.updateLock, which means it blocks out any other writes to the HLog for this time. Instead, this patch queues the write-data completely inside the hbase code and the HLog.updateLock is released much much earlier. > If I read the patch correctly it adds new sync calls that are performed with the updateLock held There are no additional code paths that this patch introduced where a HLog.sync occurs within the HLog.updateLock. The only pre-existing place where a sync calls occurs within the HLog.updateLock is in HLog.cleanupCurrentWriter. This code path is invoked only during a logroll; and during a logroll all currently executing transactions are anyways quiesced, so I see no impact to concurrency in this code path. Please let me know if you disagree. > if (this.closed) : Is this check not needed here then? I think this code is still needed, this check was not introduced by this patch. Any reasons why you think it might not be needed? > Can we pass new config parameter through conf which enables this new feature ? I can surely make this configurable. But I am doubtful if anybody would care to run with this feature switched off (since it is a performance fix with potentially no downsides for any usecase). But if you feel strongly about it, I will make it configurable. However, most of the common code in HLog will not be inside this configurable parameter, only the change in HRegion.increment() can be switched on/off via this new config. Please let me know.
          Hide
          ramkrishna.s.vasudevan added a comment -
          -    synchronized (this.updateLock) {
          -      if (this.closed) {
          -        return;
          -      }
          +    syncer(this.unflushedEntries.get()); // sync all pending items
          

          The patch shows that the above 'if' check is been removed. So it is needed.

          Show
          ramkrishna.s.vasudevan added a comment - - synchronized ( this .updateLock) { - if ( this .closed) { - return ; - } + syncer( this .unflushedEntries.get()); // sync all pending items The patch shows that the above 'if' check is been removed. So it is needed.
          Hide
          dhruba borthakur added a comment -

          Hi Ramakrishna, from my understanding, HLog.syncer() used to acquire the updateLock and then check this.closed, then used to release the updateLock and proceed with the sync. So, essentially the check was providing no protection against the concurrent execution of the body of the code in syncer() vs some other thread marking this.closed to true.

          But I agree with you, that even prior to this patch, there is a rare possibility that a region server shutdown (via HLog.close) can race with a HLog.sync() that is in progress and cause exceptions. If I can recreate that race in a test case, I will submit it as part of another JIRA, does that sound reasonable?

          Show
          dhruba borthakur added a comment - Hi Ramakrishna, from my understanding, HLog.syncer() used to acquire the updateLock and then check this.closed, then used to release the updateLock and proceed with the sync. So, essentially the check was providing no protection against the concurrent execution of the body of the code in syncer() vs some other thread marking this.closed to true. But I agree with you, that even prior to this patch, there is a rare possibility that a region server shutdown (via HLog.close) can race with a HLog.sync() that is in progress and cause exceptions. If I can recreate that race in a test case, I will submit it as part of another JIRA, does that sound reasonable?
          Hide
          dhruba borthakur added a comment -

          > I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks

          I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?

          Show
          dhruba borthakur added a comment - > I'd mark this an 'incompatible change' rather than an 'improvement' just so its more likely folks I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?
          Hide
          Andrew Purtell added a comment -

          I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such?

          This is a committer requirement Dhruba, don't worry about it, unless you disagree about the characterization. There is a checkbox to mark when resolving the issue and where the changelog line goes in CHANGES.txt is different.

          Show
          Andrew Purtell added a comment - I could not find the place where to mark it as "incompatible". Can you pl tell me where to mark it as such? This is a committer requirement Dhruba, don't worry about it, unless you disagree about the characterization. There is a checkbox to mark when resolving the issue and where the changelog line goes in CHANGES.txt is different.
          Hide
          Andrew Purtell added a comment -

          My comment lines up with Stack's on the code review, basically: This is a new code path for increments, and it means that one cannot guarantee that any individual increment has been committed to the WAL. For the increment case this walks back some representations we make about HBase for the sake of performance. I have mixed feelings about that. This should be configurable. Happy to agree the default is "enabled".

          I believe there is another JIRA somewhere about making different config examples for users depending on their risk tolerance or interest in performance. Enabling this and deferred flushing etc. makes sense in a "performance" profile; conversely turned off in a "strict" profile.

          Show
          Andrew Purtell added a comment - My comment lines up with Stack's on the code review, basically: This is a new code path for increments, and it means that one cannot guarantee that any individual increment has been committed to the WAL. For the increment case this walks back some representations we make about HBase for the sake of performance. I have mixed feelings about that. This should be configurable. Happy to agree the default is "enabled". I believe there is another JIRA somewhere about making different config examples for users depending on their risk tolerance or interest in performance. Enabling this and deferred flushing etc. makes sense in a "performance" profile; conversely turned off in a "strict" profile.
          Hide
          dhruba borthakur added a comment -

          > one cannot guarantee that any individual increment has been committed to the WAL.

          Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync.

          From my understanding, the contract for hbase clients is that an increment operation is deemed complete only after it has been sycned to the transaction log. This contract is valid prior to this patch and this patch does not change this contract. The server does not return the RPC response back to the client until the increment operation has been sycned to the wal.

          Also, the increment operation does not follow the RWCC rules, thereby exposing uncommitted increment operations to readers. This patch does not change that behaviour either.

          is there something I am missing?

          Show
          dhruba borthakur added a comment - > one cannot guarantee that any individual increment has been committed to the WAL. Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync. From my understanding, the contract for hbase clients is that an increment operation is deemed complete only after it has been sycned to the transaction log. This contract is valid prior to this patch and this patch does not change this contract. The server does not return the RPC response back to the client until the increment operation has been sycned to the wal. Also, the increment operation does not follow the RWCC rules, thereby exposing uncommitted increment operations to readers. This patch does not change that behaviour either. is there something I am missing?
          Hide
          Todd Lipcon added a comment -

          looking over the patch, I think I agree with Dhruba - dropping the row locks before syncing would only have effects on ACID if we were using row locks for visibility. But, we use RWCC/MVCC for visibility. So which guarantee are we losing?

          Show
          Todd Lipcon added a comment - looking over the patch, I think I agree with Dhruba - dropping the row locks before syncing would only have effects on ACID if we were using row locks for visibility. But, we use RWCC/MVCC for visibility. So which guarantee are we losing?
          Hide
          Andrew Purtell added a comment -

          Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync.

          Point taken.

          Show
          Andrew Purtell added a comment - Even prior to this patch, individual increments were batched-synced-by-hdfs, so there was no guarantee that each increment operation resulted in a sync. Point taken.
          Hide
          Lars Hofhansl added a comment -

          @ramkrishna and @dhruba. I agree with ramkrishna that the if (this.closed) is still needed.
          True, there already was a race and that was the point of HBASE-4387.
          We want to sync without the updateLock held, while at the same make sure we don't sync unless we need to (i.e. HLog not closed) and if we fail (which will be rare) without the lock we retry with the lock.

          Show
          Lars Hofhansl added a comment - @ramkrishna and @dhruba. I agree with ramkrishna that the if (this.closed) is still needed. True, there already was a race and that was the point of HBASE-4387 . We want to sync without the updateLock held, while at the same make sure we don't sync unless we need to (i.e. HLog not closed) and if we fail (which will be rare) without the lock we retry with the lock.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I feel that check will be helpful if at that point of calling syncer() the this.closed was true.
          But there is a race condition there.

          Show
          ramkrishna.s.vasudevan added a comment - I feel that check will be helpful if at that point of calling syncer() the this.closed was true. But there is a race condition there.
          Hide
          Ted Yu added a comment -

          Patch v7 adds check for closed in syncer(long txid).
          Hopefully v7 can make Ramkrishna happy.

          Show
          Ted Yu added a comment - Patch v7 adds check for closed in syncer(long txid). Hopefully v7 can make Ramkrishna happy.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Dhruba
          I was happy with v6 itself. .
          Just a suggestion it was.
          @Ted

          Show
          ramkrishna.s.vasudevan added a comment - @Dhruba I was happy with v6 itself. . Just a suggestion it was. @Ted
          Hide
          dhruba borthakur added a comment -

          v7 looks good to me, thanks for doing it Ted.

          Show
          dhruba borthakur added a comment - v7 looks good to me, thanks for doing it Ted.
          Hide
          stack added a comment -

          +1 on v7 (IIUC, Ram and Lars close issue is addressed by the call to sync if edits and then the close). From the back and forth above, my take away is this patch makes the grouping fatter but doesn't change current semantics – we are as 'broke' as we were before this patch. I no longer think this an incompatible change. We should release note the fatter grouping of increments on commit.

          Show
          stack added a comment - +1 on v7 (IIUC, Ram and Lars close issue is addressed by the call to sync if edits and then the close). From the back and forth above, my take away is this patch makes the grouping fatter but doesn't change current semantics – we are as 'broke' as we were before this patch. I no longer think this an incompatible change. We should release note the fatter grouping of increments on commit.
          Hide
          Jonathan Gray added a comment -

          +1 as well. And agree with your assessment above, Stack. Potential fatter grouping of increments and significant improvement of per-row throughput.

          Looking forward to getting this working for Put/MultiPut! Nice work, Dhruba.

          Show
          Jonathan Gray added a comment - +1 as well. And agree with your assessment above, Stack. Potential fatter grouping of increments and significant improvement of per-row throughput. Looking forward to getting this working for Put/MultiPut! Nice work, Dhruba.
          Hide
          Ted Yu added a comment -

          Integrated patch v7 to TRUNK.

          Thanks for the nice feature Dhruba.

          Thanks for the reviews from Jonathan, Stack, Lars, Todd, Andrew and Ramkrishna.

          Show
          Ted Yu added a comment - Integrated patch v7 to TRUNK. Thanks for the nice feature Dhruba. Thanks for the reviews from Jonathan, Stack, Lars, Todd, Andrew and Ramkrishna.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2274 (See https://builds.apache.org/job/HBase-TRUNK/2274/)
          HBASE-4487 The increment operation can release the rowlock before sync-ing
          the Hlog (dhruba borthakur)

          tedyu :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2274 (See https://builds.apache.org/job/HBase-TRUNK/2274/ ) HBASE-4487 The increment operation can release the rowlock before sync-ing the Hlog (dhruba borthakur) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestIncrement.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogBench.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
          Hide
          Lars Hofhansl added a comment -

          Is this too risky for 0.92? I guess we have to draw the line somewhere, but this seems like a great performance enhancement (along with HBASE-4528).

          Show
          Lars Hofhansl added a comment - Is this too risky for 0.92? I guess we have to draw the line somewhere, but this seems like a great performance enhancement (along with HBASE-4528 ).
          Hide
          dhruba borthakur added a comment -

          Hi Lars, this is a performance fix, so it is unlikely to be a candidate for 0.92. How about we make frequent releases (say 0.94? coming out sooner rather than later) rather than porting these to existing release-candidates?

          Show
          dhruba borthakur added a comment - Hi Lars, this is a performance fix, so it is unlikely to be a candidate for 0.92. How about we make frequent releases (say 0.94? coming out sooner rather than later) rather than porting these to existing release-candidates?
          Hide
          Ted Yu added a comment -

          Resolve for now.
          Can backport to 0.92 in another JIRA.

          Show
          Ted Yu added a comment - Resolve for now. Can backport to 0.92 in another JIRA.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #15 (See https://builds.apache.org/job/HBase-0.94-security/15/)
          HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327672)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #15 (See https://builds.apache.org/job/HBase-0.94-security/15/ ) HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327672) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2783 (See https://builds.apache.org/job/HBase-TRUNK/2783/)
          HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327673)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2783 (See https://builds.apache.org/job/HBase-TRUNK/2783/ ) HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327673) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #129 (See https://builds.apache.org/job/HBase-0.94/129/)
          HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327672)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #129 (See https://builds.apache.org/job/HBase-0.94/129/ ) HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327672) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #175 (See https://builds.apache.org/job/HBase-TRUNK-security/175/)
          HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327673)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #175 (See https://builds.apache.org/job/HBase-TRUNK-security/175/ ) HBASE-5782 Edits can be appended out of seqid order since HBASE-4487 (Revision 1327673) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          Hide
          ShiXing added a comment -

          @Ted Yu or @dhruba borthakur
          I have a question about the code in the trunk:

          try {
                Integer lid = getLock(lockid, row, true);
                this.updatesLock.readLock().lock();
                try {
                    xxxx
                } finally {
                  this.updatesLock.readLock().unlock();
                  releaseRowLock(lid);
                }
                if (writeToWAL) {
                  this.log.sync(txid); // sync the transaction log outside the rowlock
                }
          } finally {
                closeRegionOperation();
          }
          

          What will happen if

          this.log.sync(txid);

          failed?

          As I know, the RS will tell client that this op failed, and the client will retry. So MemStore and hlog on HDFS are inconsistent.

          Am I right? Or the RS exits if an HDFS write/sync fails as Dhruba said?

          Show
          ShiXing added a comment - @ Ted Yu or @ dhruba borthakur I have a question about the code in the trunk: try { Integer lid = getLock(lockid, row, true ); this .updatesLock.readLock().lock(); try { xxxx } finally { this .updatesLock.readLock().unlock(); releaseRowLock(lid); } if (writeToWAL) { this .log.sync(txid); // sync the transaction log outside the rowlock } } finally { closeRegionOperation(); } What will happen if this .log.sync(txid); failed? As I know, the RS will tell client that this op failed, and the client will retry. So MemStore and hlog on HDFS are inconsistent. Am I right? Or the RS exits if an HDFS write/sync fails as Dhruba said?
          Hide
          dhruba borthakur added a comment -

          If the hlog sync failed, you can invoke

          
                // if the wal sync was unsuccessful, remove keys from memstore
                if (!walSyncSuccessful) {
                  rollbackMemstore(batchOp, familyMaps, firstIndex, lastIndexExclusive);
                }
          

          Similar code exists inside doMiniBatchPut

          Show
          dhruba borthakur added a comment - If the hlog sync failed, you can invoke // if the wal sync was unsuccessful, remove keys from memstore if (!walSyncSuccessful) { rollbackMemstore(batchOp, familyMaps, firstIndex, lastIndexExclusive); } Similar code exists inside doMiniBatchPut
          Hide
          ShiXing added a comment -

          But now there is no code for hlog sync in Increment and Append

          Show
          ShiXing added a comment - But now there is no code for hlog sync in Increment and Append

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development