Uploaded image for project: 'Phoenix Tephra'
  1. Phoenix Tephra
  2. TEPHRA-133

When the final transaction edit log is not cleanly closed, Tephra should be able to recover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0
    • 0.4.4, 0.6.3
    • core
    • None

    Description

      tephra service was restarting and noticed the following issue in the logs

      java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException
      	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
      	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
      	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
      	at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:99) ~[org.apache.twill.twill-yarn-0.6.0-incubating-SNAPSHOT.jar:0.6.0-incubating-SNAPSHOT]
      	at org.apache.twill.internal.container.TwillContainerMain.main(TwillContainerMain.java:107) [org.apache.twill.twill-yarn-0.6.0-incubating-SNAPSHOT.jar:0.6.0-incubating-SNAPSHOT]
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_75]
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_75]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75]
      	at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
      	at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:88) [launcher.a1032748-944f-4dcf-a81f-d4acd9b54331.jar:na]
      java.lang.RuntimeException: java.io.EOFException
      	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:151) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at co.cask.tephra.TransactionManager.replayLogs(TransactionManager.java:571) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at co.cask.tephra.TransactionManager.recoverState(TransactionManager.java:471) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at co.cask.tephra.TransactionManager.doStart(TransactionManager.java:218) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170) ~[com.google.guava.guava-13.0.1.jar:na]
      	at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:175) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at co.cask.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:175) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
      	at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_75]
      Caused by: java.io.EOFException: null
      	at java.io.DataInputStream.readInt(DataInputStream.java:392) ~[na:1.7.0_75]
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2357) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na]
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2257) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na]
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2303) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na]
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:160) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:149) ~[co.cask.tephra.tephra-core-0.6.2.jar:na]
      	... 9 common frames omitted
      

      Took the the latest tx.log that was tried to to be replayed and tested locally to decode the transaction edits in them.

      It failed towards the end with the following exception

      java.lang.RuntimeException: java.io.EOFException
      	at com.google.common.base.Throwables.propagate(Throwables.java:160)
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:151)
      	at co.cask.tephra.persist.HDFSTransactionStateStorageTest.testTxEdits(HDFSTransactionStateStorageTest.java:86)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
      	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
      	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
      	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
      	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
      Caused by: java.io.EOFException
      	at java.io.DataInputStream.readInt(DataInputStream.java:392)
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2357)
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2257)
      	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2303)
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:161)
      	at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:149)
      	... 31 more
      

      This is likely because the txlog was not closed properly causing the EOFException while calling next to decode. I noticed the position on the file before this Exception is close to the End of file.
      Attaching the problematic txlog.

      Attachments

        1. txlog.1443792213636
          6.29 MB
          Shankar Selvam

        Activity

          People

            gokulavasan Gokul Gunasekaran
            gsps1 Shankar Selvam
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: