HBase
  1. HBase
  2. HBASE-2643

Figure how to deal with eof splitting logs

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.89.20100621
    • Fix Version/s: 0.89.20100924, 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When splitting the WAL and encountering EOF, it's not clear what to do. Initial discussion of this started in http://review.hbase.org/r/74/ - summarizing here for brevity:

      We can get an EOFException while splitting the WAL in the following cases:

      • The writer died after creating the file but before even writing the header (or crashed halfway through writing the header)
      • The writer died in the middle of flushing some data - sync() guarantees that we can see at least the last edit, but we may see half of an edit that was being written out when the RS crashed (especially for large rows)
      • The data was actually corrupted somehow (eg a length field got changed to be too long and thus points past EOF)

      Ideally we would know when we see EOF whether it was really the last record, and in that case, simply drop that record (it wasn't synced, so therefore we dont need to split it). Some open questions:

      • Currently we ignore empty files. Is it ok to ignore an empty log file if it's not the last one?
      • Similarly, do we ignore an EOF mid-record if it's not the last log file?
      1. HBASE-2643.patch
        4 kB
        Nicolas Spiegelberg
      2. ch03s02.html
        4 kB
        stack

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Updated description to a summary of the issue at hand - refer back to original review for original discussion.

          Show
          Todd Lipcon added a comment - Updated description to a summary of the issue at hand - refer back to original review for original discussion.
          Hide
          Richard Lackey added a comment -

          There is an assumption that no RuntimeException will occur within splitLog. While such an exception is remote, it is possible. Should a RuntimeException occur, then it will percolate through HMaster, which will not join the cluster.

          This is a general condition. The assumption in HDFS is that this cannot occur (or will be caught by upper layer), which is to say that none of the lower layers catches Exception to prevent the RuntimeException, e.g., NullPointerException, from percolating through. If the SequenceFile contains garbage (or has been corrupted), then the opportunity for the underlying DataInputStream to throw a RuntimeException increases.

          The solution is to add a catch for Exception in splitLog and consider the log corrupt.

          Show
          Richard Lackey added a comment - There is an assumption that no RuntimeException will occur within splitLog. While such an exception is remote, it is possible. Should a RuntimeException occur, then it will percolate through HMaster, which will not join the cluster. This is a general condition. The assumption in HDFS is that this cannot occur (or will be caught by upper layer), which is to say that none of the lower layers catches Exception to prevent the RuntimeException, e.g., NullPointerException, from percolating through. If the SequenceFile contains garbage (or has been corrupted), then the opportunity for the underlying DataInputStream to throw a RuntimeException increases. The solution is to add a catch for Exception in splitLog and consider the log corrupt.
          Hide
          stack added a comment -

          Good stuff Richard. If a runtime exception splitting logs, we should fail to start the cluster? It means likely dataloss. Or, there is a flag in hbase now – its in the splitlog stuff – IIRC which says fail-if-any-error OR try-and=keep-going across errors. If this flag is set, we should catch the RuntimeException, log it and not start the master. Otherwise, we should log the exception and then let the master proceed.

          Show
          stack added a comment - Good stuff Richard. If a runtime exception splitting logs, we should fail to start the cluster? It means likely dataloss. Or, there is a flag in hbase now – its in the splitlog stuff – IIRC which says fail-if-any-error OR try-and=keep-going across errors. If this flag is set, we should catch the RuntimeException, log it and not start the master. Otherwise, we should log the exception and then let the master proceed.
          Hide
          Richard Lackey added a comment -

          Since the exception is likely to be the result of a corrupted log, I interpreted that to be within the realm of the flag setting – a generalization of the intent.

          It seems like adding the catch, logging the exception, and following the flag setting, would add more predictive behavior. At a minimum it provides more documentation at startup, which should permit problem source isolation.

          Show
          Richard Lackey added a comment - Since the exception is likely to be the result of a corrupted log, I interpreted that to be within the realm of the flag setting – a generalization of the intent. It seems like adding the catch, logging the exception, and following the flag setting, would add more predictive behavior. At a minimum it provides more documentation at startup, which should permit problem source isolation.
          Hide
          Nicolas Spiegelberg added a comment -

          We have encountered this EOF problem in our test cluster this week. Is there a use case where an EOF could lead to data loss instead of just indicating data truncation due to connection failure? HDFS throws a ChecksumException IOE with corrupt disk data, so EOF should only indicate application-level corruption. It seems like we should handle the EOF case differently than normal IOEs and proceed even when 'hbase.hlog.split.skip.errors' == false.

          Show
          Nicolas Spiegelberg added a comment - We have encountered this EOF problem in our test cluster this week. Is there a use case where an EOF could lead to data loss instead of just indicating data truncation due to connection failure? HDFS throws a ChecksumException IOE with corrupt disk data, so EOF should only indicate application-level corruption. It seems like we should handle the EOF case differently than normal IOEs and proceed even when 'hbase.hlog.split.skip.errors' == false.
          Hide
          Nicolas Spiegelberg added a comment -

          IIIRC:

          [5:09pm] nspiegelberg: I'm just wondering if there's a reason for us to not continue through an EOF
          [5:09pm] St^Ack: I can't think of one
          [5:10pm] St^Ack: what you say is reasonable nspiegelberg
          [5:10pm] St^Ack: least till we learn otherwise
          [5:10pm] St^Ack: I have to go lads... pick up kid
          [5:10pm] kannan: i agree... we could check to see if the IOE was an EOF and continue on....

          Show
          Nicolas Spiegelberg added a comment - IIIRC: [5:09pm] nspiegelberg: I'm just wondering if there's a reason for us to not continue through an EOF [5:09pm] St^Ack: I can't think of one [5:10pm] St^Ack: what you say is reasonable nspiegelberg [5:10pm] St^Ack: least till we learn otherwise [5:10pm] St^Ack: I have to go lads... pick up kid [5:10pm] kannan: i agree... we could check to see if the IOE was an EOF and continue on....
          Hide
          Nicolas Spiegelberg added a comment -

          Changed the LOG from warn to info level after internal review

          Show
          Nicolas Spiegelberg added a comment - Changed the LOG from warn to info level after internal review
          Hide
          stack added a comment -

          OK. Lets go w/ this (We keep going of EOF in any WAL during spilts even if 'hbase.hlog.split.skip.errors' == false.). I wrote the 'decision' into new WAL chapter in the hbase 'book'. Thanks for the patch Nicolas.

          Show
          stack added a comment - OK. Lets go w/ this (We keep going of EOF in any WAL during spilts even if 'hbase.hlog.split.skip.errors' == false.). I wrote the 'decision' into new WAL chapter in the hbase 'book'. Thanks for the patch Nicolas.
          Hide
          stack added a comment -

          Here is what the doc of the eof handling looks like currently (For Nicolas). To be improved.

          Show
          stack added a comment - Here is what the doc of the eof handling looks like currently (For Nicolas). To be improved.
          Hide
          Jean-Daniel Cryans added a comment -

          Adding this to the latest 0.89

          Show
          Jean-Daniel Cryans added a comment - Adding this to the latest 0.89
          Hide
          Jean-Daniel Cryans added a comment -

          While testing 0.89.20100924, I got this:

          Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.948 sec <<< FAILURE!
          testEOFisIgnored(org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit)  Time elapsed: 0.179 sec  <<< ERROR!
          java.io.IOException: hdfs://localhost:62668/hbase/hlog/hlog.dat.0, pos=1012, edit=9
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:165)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:137)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:122)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.parseHLog(HLog.java:1564)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1323)
                  at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
                  at org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testEOFisIgnored(TestHLogSplit.java:317)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
                  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
                  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
                  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
                  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
                  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
                  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
                  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
                  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
                  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
                  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
                  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
                  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
                  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
                  at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
                  at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
                  at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115)
                  at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:140)
                  at org.apache.maven.surefire.Surefire.run(Surefire.java:109)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:290)
                  at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1017)
          Caused by: java.io.EOFException
                  at java.io.DataInputStream.readFully(DataInputStream.java:180)
                  at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
                  at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
                  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883)
                  at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:135)
                  ... 35 more
          

          And looking at the code it doesn't check if the source of the IOE is a EOF. I don't see it failing on trunk, was it handled in the scope of another jira?

          Show
          Jean-Daniel Cryans added a comment - While testing 0.89.20100924, I got this: Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.948 sec <<< FAILURE! testEOFisIgnored(org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit) Time elapsed: 0.179 sec <<< ERROR! java.io.IOException: hdfs://localhost:62668/hbase/hlog/hlog.dat.0, pos=1012, edit=9 at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:165) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:137) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:122) at org.apache.hadoop.hbase.regionserver.wal.HLog.parseHLog(HLog.java:1564) at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1323) at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210) at org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testEOFisIgnored(TestHLogSplit.java:317) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:115) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:140) at org.apache.maven.surefire.Surefire.run(Surefire.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:290) at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1017) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:135) ... 35 more And looking at the code it doesn't check if the source of the IOE is a EOF. I don't see it failing on trunk, was it handled in the scope of another jira?
          Hide
          Nicolas Spiegelberg added a comment -

          @jd : This bug was introduced by HBASE-2889. It is psuedo-fixed in the trunk & I have a more permanent fix up on the JIRA. Use reflection to maintain the original error type and added pertinent info.

          Show
          Nicolas Spiegelberg added a comment - @jd : This bug was introduced by HBASE-2889 . It is psuedo-fixed in the trunk & I have a more permanent fix up on the JIRA. Use reflection to maintain the original error type and added pertinent info.
          Hide
          Jean-Daniel Cryans added a comment -

          Thanks Nicolas!

          Show
          Jean-Daniel Cryans added a comment - Thanks Nicolas!

            People

            • Assignee:
              Nicolas Spiegelberg
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development