Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, 2.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We used to write COMPLETE_FLUSH event to WAL until it got removed in 0.96 in issue HBASE-7329.

      For secondary region replicas, it is important to share the data files with the primary region. So we should reintroduce the flush wal markers so that the secondary region replicas can pick up the newly flushed files from the WAL and start serving data from those.

      A design doc which explains the context a bit better can be found in HBASE-11183.

      1. hbase-11511_addendum.patch
        2 kB
        Enis Soztutar
      2. hbase-11511_v1.patch
        131 kB
        Enis Soztutar
      3. hbase-11511_v1.patch
        13 kB
        Enis Soztutar
      4. hbase-11511_v2.patch
        131 kB
        Enis Soztutar
      5. hbase-11511_v3.patch
        133 kB
        Enis Soztutar
      6. hbase-11511_v4.patch
        132 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5309 (See https://builds.apache.org/job/HBase-TRUNK/5309/)
          HBASE-11511 Write flush events to WAL (ADDENDUM to fix flaky test) (enis: rev 58727343f2b7ac4093828ca645084901d7ca568c)

          • hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5309 (See https://builds.apache.org/job/HBase-TRUNK/5309/ ) HBASE-11511 Write flush events to WAL (ADDENDUM to fix flaky test) (enis: rev 58727343f2b7ac4093828ca645084901d7ca568c) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #46 (See https://builds.apache.org/job/HBase-1.0/46/)
          HBASE-11511 Write flush events to WAL (ADDENDUM to fix flaky test) (enis: rev 07643580bb267a55f3e702bd801c07490d010213)

          • hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-1.0 #46 (See https://builds.apache.org/job/HBase-1.0/46/ ) HBASE-11511 Write flush events to WAL (ADDENDUM to fix flaky test) (enis: rev 07643580bb267a55f3e702bd801c07490d010213) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Hide
          Enis Soztutar added a comment -

          Newly added test sometimes fails flakily. Although I could not reproduce it locally, I think it is because of a fs list ordering issue. Will commit the addendum shortly. Let's see whether this solves the issue.

          Example trace:
          https://builds.apache.org/job/hbase-trunk/5308/testReport/junit/org.apache.hadoop.hbase.regionserver/TestHRegionBusyWait/testFlushMarkers/

          Show
          Enis Soztutar added a comment - Newly added test sometimes fails flakily. Although I could not reproduce it locally, I think it is because of a fs list ordering issue. Will commit the addendum shortly. Let's see whether this solves the issue. Example trace: https://builds.apache.org/job/hbase-trunk/5308/testReport/junit/org.apache.hadoop.hbase.regionserver/TestHRegionBusyWait/testFlushMarkers/
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #45 (See https://builds.apache.org/job/HBase-1.0/45/)
          HBASE-11511 Write flush events to WAL (enis: rev 4fa0e3274fbade789ae5801bf34ee532972a2ec4)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
          • hbase-protocol/src/main/protobuf/WAL.proto
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-1.0 #45 (See https://builds.apache.org/job/HBase-1.0/45/ ) HBASE-11511 Write flush events to WAL (enis: rev 4fa0e3274fbade789ae5801bf34ee532972a2ec4) hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java hbase-protocol/src/main/protobuf/WAL.proto
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5308 (See https://builds.apache.org/job/HBase-TRUNK/5308/)
          HBASE-11511 Write flush events to WAL (enis: rev bbe29eb93cc819fbf0287aa2cb343649b72783bf)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
          • hbase-protocol/src/main/protobuf/WAL.proto
          • hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5308 (See https://builds.apache.org/job/HBase-TRUNK/5308/ ) HBASE-11511 Write flush events to WAL (enis: rev bbe29eb93cc819fbf0287aa2cb343649b72783bf) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java hbase-protocol/src/main/protobuf/WAL.proto hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
          Hide
          Enis Soztutar added a comment -

          Committed this to master and branch-1. Thanks Stack for review.

          Show
          Enis Soztutar added a comment - Committed this to master and branch-1. Thanks Stack for review.
          Hide
          Enis Soztutar added a comment -

          Attaching the final version that has been committed. Minor rebase.

          Show
          Enis Soztutar added a comment - Attaching the final version that has been committed. Minor rebase.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12655645/hbase-11511_v3.patch
          against trunk revision .
          ATTACHMENT ID: 12655645

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10062//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655645/hbase-11511_v3.patch against trunk revision . ATTACHMENT ID: 12655645 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10062//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12655623/hbase-11511_v2.patch
          against trunk revision .
          ATTACHMENT ID: 12655623

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + new java.lang.String[]

          { "Action", "FlushSeqId", "TableName", "EncodedRegionName", "StoreFlushes", }

          );
          + desc, sequenceId, false); // no sync. Sync is below where we do not hold the updates lock
          + long trx = log.appendNoSync(htd, info, key, WALEdit.createFlushWALEdit(info, f), sequenceId, false, null);

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655623/hbase-11511_v2.patch against trunk revision . ATTACHMENT ID: 12655623 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + new java.lang.String[] { "Action", "FlushSeqId", "TableName", "EncodedRegionName", "StoreFlushes", } ); + desc, sequenceId, false); // no sync. Sync is below where we do not hold the updates lock + long trx = log.appendNoSync(htd, info, key, WALEdit.createFlushWALEdit(info, f), sequenceId, false, null); +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10060//console This message is automatically generated.
          Hide
          Enis Soztutar added a comment -

          v3 addresses Stack's and Ted's feedback. Thanks for reviews. I'll commit if hadoopqa passes.

          Show
          Enis Soztutar added a comment - v3 addresses Stack's and Ted's feedback. Thanks for reviews. I'll commit if hadoopqa passes.
          Hide
          stack added a comment -

          +1 then

          Show
          stack added a comment - +1 then
          Hide
          Enis Soztutar added a comment -

          Thanks Stack for taking a look.

          That is ok? What if an edit after the start flush edit was added?

          Good point. We writelock the updatesLock, then start the mvcc transaction, so it should be the case that sync() and sync(trx) should be the same thing I guess. But let me change it to be plain old sync() to be failure-proof.

          ... or are we still dealing with failure at this point?

          Indeed. The block above is a catch block for catching an ignoring any exception from writing the abort marker to WAL. The call to wal.abortCacheFlush() should still happen in the outer block, resulting in DroppedSnapshotException.

          Show
          Enis Soztutar added a comment - Thanks Stack for taking a look. That is ok? What if an edit after the start flush edit was added? Good point. We writelock the updatesLock, then start the mvcc transaction, so it should be the case that sync() and sync(trx) should be the same thing I guess. But let me change it to be plain old sync() to be failure-proof. ... or are we still dealing with failure at this point? Indeed. The block above is a catch block for catching an ignoring any exception from writing the abort marker to WAL. The call to wal.abortCacheFlush() should still happen in the outer block, resulting in DroppedSnapshotException.
          Hide
          stack added a comment -

          Before we would sync all outstanding edits. Now when we flush we flush just the start flush tx?

          • if (wal != null && !shouldSyncLog()) wal.sync();
            + if (wal != null)
            Unknown macro: {+ try { + wal.sync(trxId); // ensure that flush marker is sync'ed + } catch (IOException ioe) { + LOG.warn("Unexpected exception while log.sync(), ignoring. Exception: " + + StringUtils.stringifyException(ioe)); + }+ }

          That is ok? What if an edit after the start flush edit was added?

          Is this last line in the right location? Should it be inside the bracket?

          + }
          wal.abortCacheFlush(this.getRegionInfo().getEncodedNameAsBytes());

          ... or are we still dealing with failure at this point?

          Patch looks good otherwise.

          Show
          stack added a comment - Before we would sync all outstanding edits. Now when we flush we flush just the start flush tx? if (wal != null && !shouldSyncLog()) wal.sync(); + if (wal != null) Unknown macro: {+ try { + wal.sync(trxId); // ensure that flush marker is sync'ed + } catch (IOException ioe) { + LOG.warn("Unexpected exception while log.sync(), ignoring. Exception: " + + StringUtils.stringifyException(ioe)); + }+ } That is ok? What if an edit after the start flush edit was added? Is this last line in the right location? Should it be inside the bracket? + } wal.abortCacheFlush(this.getRegionInfo().getEncodedNameAsBytes()); ... or are we still dealing with failure at this point? Patch looks good otherwise.
          Hide
          Enis Soztutar added a comment -
          Show
          Enis Soztutar added a comment - RB here: https://reviews.apache.org/r/23460/
          Hide
          Enis Soztutar added a comment -

          Attaching rebased patch.

          Show
          Enis Soztutar added a comment - Attaching rebased patch.
          Hide
          Enis Soztutar added a comment -

          Here is a patch that writes START_FLUSH, COMMIT_FLUSH and ABORT_FLUSH events to WAL.

          I tried to take special care to the exception code paths to make sure that we do not cause hangs, etc on flush fail and region close.

          COMMIT_FLUSH is written after the store files are actually committed, and it does not happen atomically ( of course). This means that we can miss COMMIT_FLUSH events after START_FLUSH if the region server fails, etc. Corresponding HBASE-11512 might help with WAL tailers to pick up new files if we persist all the hfiles for the region at region open time to WAL.

          I am running the unit tests.

          Show
          Enis Soztutar added a comment - Here is a patch that writes START_FLUSH, COMMIT_FLUSH and ABORT_FLUSH events to WAL. I tried to take special care to the exception code paths to make sure that we do not cause hangs, etc on flush fail and region close. COMMIT_FLUSH is written after the store files are actually committed, and it does not happen atomically ( of course). This means that we can miss COMMIT_FLUSH events after START_FLUSH if the region server fails, etc. Corresponding HBASE-11512 might help with WAL tailers to pick up new files if we persist all the hfiles for the region at region open time to WAL. I am running the unit tests.

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development