HBase
  1. HBase
  2. HBASE-6065

Log for flush would append a non-sequential edit in the hlog, leading to possible data loss

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: wal
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush.

      public void completeCacheFlush(final byte [] encodedRegionName,
            final byte [] tableName, final long logSeqId, final boolean isMetaRegion)
      {
      ...
      HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
                  System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
      ...
      }
      

      when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush,
      Here, we may append a lower seq id edit than the last edit in the hlog file.

      If it is the last edit log in the file, it may cause data loss.
      because

      
      HRegion#replayRecoveredEditsIfAny{
      ...
      maxSeqId = Math.abs(Long.parseLong(fileName));
            if (maxSeqId <= minSeqId) {
              String msg = "Maximum sequenceid for this log is " + maxSeqId
                  + " and minimum sequenceid for the region is " + minSeqId
                  + ", skipped the whole file, path=" + edits;
              LOG.debug(msg);
              continue;
            }
      ...
      }
      

      We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file.

      1. HBASE-6065v2.patch
        1 kB
        chunhui shen
      2. HBASE-6065.patch
        0.8 kB
        chunhui shen

        Issue Links

          Activity

          Hide
          chunhui shen added a comment -

          In the patch, I obtainSeqNum() for the flush log edit rather than the seqId from parameter.
          So we could ensure the log seq id is always sequential in the file.
          BTW, do we use the flush log edit anywhere?

          There is another solution: change the splitted log file's name to the real max seq id, rather than the last seq id

          Show
          chunhui shen added a comment - In the patch, I obtainSeqNum() for the flush log edit rather than the seqId from parameter. So we could ensure the log seq id is always sequential in the file. BTW, do we use the flush log edit anywhere? There is another solution: change the splitted log file's name to the real max seq id, rather than the last seq id
          Hide
          Ted Yu added a comment -

          Can you provide a test that shows this defect ?

          If logSeqId is not used by completeCacheFlush(), can we omit this parameter in the call ?

          Show
          Ted Yu added a comment - Can you provide a test that shows this defect ? If logSeqId is not used by completeCacheFlush(), can we omit this parameter in the call ?
          Hide
          ramkrishna.s.vasudevan added a comment -

          So this applies to 0.94 and above only right?

          Show
          ramkrishna.s.vasudevan added a comment - So this applies to 0.94 and above only right?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          What type of dataloass you see here? is it the edit with HBASE::CACHEFLUSH that gets missed here?
          Ideally by design that edit is actually needed to show up to what point the flush has been done and the same is added as an entry in HLog.
          Even while recovering we tend to skip this entry.

                     // Check this edit is for me. Also, guard against writing the special
                      // METACOLUMN info such as HBASE::CACHEFLUSH entries
                      if (kv.matchingFamily(HLog.METAFAMILY) ||
                          !Bytes.equals(key.getEncodedRegionName(), this.regionInfo.getEncodedNameAsBytes())) {
                        skippedEdits++;
                        continue;
                          }
          

          Did you find any other type of dataloss which i am not able to foresee here? Correct me if am wrong.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui What type of dataloass you see here? is it the edit with HBASE::CACHEFLUSH that gets missed here? Ideally by design that edit is actually needed to show up to what point the flush has been done and the same is added as an entry in HLog. Even while recovering we tend to skip this entry. // Check this edit is for me. Also, guard against writing the special // METACOLUMN info such as HBASE::CACHEFLUSH entries if (kv.matchingFamily(HLog.METAFAMILY) || !Bytes.equals(key.getEncodedRegionName(), this .regionInfo.getEncodedNameAsBytes())) { skippedEdits++; continue ; } Did you find any other type of dataloss which i am not able to foresee here? Correct me if am wrong.
          Hide
          chunhui shen added a comment -

          Suppose region A on the regionserver B,
          The issue could reproduce as the following step:

          1.put one data to region A (append seq 1 in the hlog)
          2.put one data to region A (append seq 2 in the hlog)
          3.region A start flush, it will call HLog#startCacheFlush (current seq num is 3 in the hlog)
          4.put one data to region A (append seq 4 in the hlog)
          5.region A complete flush, it will call HLog#completeCacheFlush (append seq 3 in the hlog)
          6.kill regionserver B.

          So, the hlog file has four edit:
          seq 1
          seq 2
          seq 4
          seq 3

          when splitting this hlog file, we generate the recoverd.edits file for region A which is named 3.(About the name, we could see HLogSplitter#splitLogFileToTemp)

          Now, when replaying recoverd.edits file for region A, we will skip this file and cause data loss.

          Show
          chunhui shen added a comment - Suppose region A on the regionserver B, The issue could reproduce as the following step: 1.put one data to region A (append seq 1 in the hlog) 2.put one data to region A (append seq 2 in the hlog) 3.region A start flush, it will call HLog#startCacheFlush (current seq num is 3 in the hlog) 4.put one data to region A (append seq 4 in the hlog) 5.region A complete flush, it will call HLog#completeCacheFlush (append seq 3 in the hlog) 6.kill regionserver B. So, the hlog file has four edit: seq 1 seq 2 seq 4 seq 3 when splitting this hlog file, we generate the recoverd.edits file for region A which is named 3.(About the name, we could see HLogSplitter#splitLogFileToTemp) Now, when replaying recoverd.edits file for region A, we will skip this file and cause data loss.
          Hide
          chunhui shen added a comment -

          I have tried to write a test, but it is a little hard.

          We also could fix the issue using another solution (patchv2):
          In current logic, we consider the last edit's seq id as the maximal seq id in the recoverd.edits file, however it is wrong because we can't ensure the sequentia edit in the hlog.
          So we should changed the logic of find the maximal seq id for the recoverd.edits file,
          We only need do a little for the method HLogSplitter#updateRegionMaximumEditLogSeqNum.

          Show
          chunhui shen added a comment - I have tried to write a test, but it is a little hard. We also could fix the issue using another solution (patchv2): In current logic, we consider the last edit's seq id as the maximal seq id in the recoverd.edits file, however it is wrong because we can't ensure the sequentia edit in the hlog. So we should changed the logic of find the maximal seq id for the recoverd.edits file, We only need do a little for the method HLogSplitter#updateRegionMaximumEditLogSeqNum.
          Hide
          ramkrishna.s.vasudevan added a comment -

          splitted log file's name to the real max seq id, rather than the last seq id

          I think this way should be fine. i.e the recovered.edits file name should have the real max seq id (was thinking that is how it is). Or the HBASE::CACHEFLUSH will have some other edit which is not actually the flushed seq id.

          Show
          ramkrishna.s.vasudevan added a comment - splitted log file's name to the real max seq id, rather than the last seq id I think this way should be fine. i.e the recovered.edits file name should have the real max seq id (was thinking that is how it is). Or the HBASE::CACHEFLUSH will have some other edit which is not actually the flushed seq id.
          Hide
          Ted Yu added a comment -

          updateRegionMaximumEditLogSeqNum() is called by two methods. I think we should make it synchronized.

          Show
          Ted Yu added a comment - updateRegionMaximumEditLogSeqNum() is called by two methods. I think we should make it synchronized.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528556/HBASE-6065v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
          org.apache.hadoop.hbase.replication.TestMasterReplication

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528556/HBASE-6065v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1956//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          updateRegionMaximumEditLogSeqNum() is called by two methods. I think we should make it synchronized

          In the patch v2, I has already added synchronized

            void updateRegionMaximumEditLogSeqNum(Entry entry) {
          -      regionMaximumEditLogSeqNum.put(entry.getKey().getEncodedRegionName(),
          -          entry.getKey().getLogSeqNum());
          +      synchronized (regionMaximumEditLogSeqNum) {
          +        Long currentMaxSeqNum=regionMaximumEditLogSeqNum.get(entry.getKey().getEncodedRegionName());
          +        if (currentMaxSeqNum == null
          +            || entry.getKey().getLogSeqNum() > currentMaxSeqNum) {
          +          regionMaximumEditLogSeqNum.put(entry.getKey().getEncodedRegionName(),
          +              entry.getKey().getLogSeqNum());
          +        }
          +      }
          
          Show
          chunhui shen added a comment - updateRegionMaximumEditLogSeqNum() is called by two methods. I think we should make it synchronized In the patch v2, I has already added synchronized void updateRegionMaximumEditLogSeqNum(Entry entry) { - regionMaximumEditLogSeqNum.put(entry.getKey().getEncodedRegionName(), - entry.getKey().getLogSeqNum()); + synchronized (regionMaximumEditLogSeqNum) { + Long currentMaxSeqNum=regionMaximumEditLogSeqNum.get(entry.getKey().getEncodedRegionName()); + if (currentMaxSeqNum == null + || entry.getKey().getLogSeqNum() > currentMaxSeqNum) { + regionMaximumEditLogSeqNum.put(entry.getKey().getEncodedRegionName(), + entry.getKey().getLogSeqNum()); + } + }
          Hide
          Ted Yu added a comment -

          I ran TestAtomicOperation 3 times on MacBook and they passed.

          +1 on patch v2.

          Show
          Ted Yu added a comment - I ran TestAtomicOperation 3 times on MacBook and they passed. +1 on patch v2.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          Again thinking on this patch
          HBASE-5782 ensures that the seqid should be in sequence. It is only the CACHEFLUSH entry that deviates from the fact that the seq id can be ordered.
          I feel that instead of checking the real max seq id we can just avoid this entry from adding to regionMaximumEditLogSeqNum.
          This will ensure that finally when we create the recovered.edits file we will have only the max entry. What do you feel? It should be simple also.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Again thinking on this patch HBASE-5782 ensures that the seqid should be in sequence. It is only the CACHEFLUSH entry that deviates from the fact that the seq id can be ordered. I feel that instead of checking the real max seq id we can just avoid this entry from adding to regionMaximumEditLogSeqNum. This will ensure that finally when we create the recovered.edits file we will have only the max entry. What do you feel? It should be simple also.
          Hide
          chunhui shen added a comment -

          @ram
          I think both ok.

          Since current patchV2 could fix the issue and maybe in the future we allow to append edit non-sequentially, thus we don't need to change this logic. I think patchV2 would be OK.

          Show
          chunhui shen added a comment - @ram I think both ok. Since current patchV2 could fix the issue and maybe in the future we allow to append edit non-sequentially, thus we don't need to change this logic. I think patchV2 would be OK.
          Hide
          Ted Yu added a comment -

          Will integrate patch v2 later today if there is no objection.

          Show
          Ted Yu added a comment - Will integrate patch v2 later today if there is no objection.
          Hide
          Ted Yu added a comment -

          Integrated to 0.94 and trunk.

          Thanks for the patch Chunhui.

          Thanks for the review, Ramkrishna.

          Show
          Ted Yu added a comment - Integrated to 0.94 and trunk. Thanks for the patch Chunhui. Thanks for the review, Ramkrishna.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #212 (See https://builds.apache.org/job/HBase-0.94/212/)
          HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342017)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #212 (See https://builds.apache.org/job/HBase-0.94/212/ ) HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342017) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2917 (See https://builds.apache.org/job/HBase-TRUNK/2917/)
          HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342019)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2917 (See https://builds.apache.org/job/HBase-TRUNK/2917/ ) HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342019) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #15 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/15/)
          HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342019)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #15 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/15/ ) HBASE-6065 Log for flush would append a non-sequential edit in the hlog, leading to possible data loss (Chunhui) (Revision 1342019) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2940 (See https://builds.apache.org/job/HBase-TRUNK/2940/)
          HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343338)

          Result = SUCCESS
          ramkrishna :
          Files :

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2940 (See https://builds.apache.org/job/HBase-TRUNK/2940/ ) HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343338) Result = SUCCESS ramkrishna : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #222 (See https://builds.apache.org/job/HBase-0.94/222/)
          HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343336)

          Result = FAILURE
          ramkrishna :
          Files :

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #222 (See https://builds.apache.org/job/HBase-0.94/222/ ) HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343336) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #30 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/30/)
          HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343338)

          Result = FAILURE
          ramkrishna :
          Files :

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #30 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/30/ ) HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343338) Result = FAILURE ramkrishna : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Hide
          Lars Hofhansl added a comment -

          Sorry for coming in late here. Was busy with HBaseCon, day after hack-a-thon and then traveling to Germany.

          Looking back through this issue, I agree with Ram. We should never write edits non-sequentially into the HLogs (or ignore cache flush meta edit).
          I imagine there is other code that scans through an HLog that might get confused about non sequential edits.

          Show
          Lars Hofhansl added a comment - Sorry for coming in late here. Was busy with HBaseCon, day after hack-a-thon and then traveling to Germany. Looking back through this issue, I agree with Ram. We should never write edits non-sequentially into the HLogs (or ignore cache flush meta edit). I imagine there is other code that scans through an HLog that might get confused about non sequential edits.
          Hide
          Lars Hofhansl added a comment -

          Lastly, since this can lose data, I think this warrants a 0.94.1 release soon.

          Show
          Lars Hofhansl added a comment - Lastly, since this can lose data, I think this warrants a 0.94.1 release soon.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #33 (See https://builds.apache.org/job/HBase-0.94-security/33/)
          HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343336)

          Result = FAILURE
          ramkrishna :
          Files :

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #33 (See https://builds.apache.org/job/HBase-0.94-security/33/ ) HBASE-6118 Add a testcase for HBASE-6065 (Ashutosh) (Revision 1343336) Result = FAILURE ramkrishna : Files : /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development