Details
Description
tephra service was restarting and noticed the following issue in the logs
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na] at org.apache.twill.internal.ServiceMain.doMain(ServiceMain.java:99) ~[org.apache.twill.twill-yarn-0.6.0-incubating-SNAPSHOT.jar:0.6.0-incubating-SNAPSHOT] at org.apache.twill.internal.container.TwillContainerMain.main(TwillContainerMain.java:107) [org.apache.twill.twill-yarn-0.6.0-incubating-SNAPSHOT.jar:0.6.0-incubating-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_75] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_75] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75] at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:88) [launcher.a1032748-944f-4dcf-a81f-d4acd9b54331.jar:na] java.lang.RuntimeException: java.io.EOFException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na] at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:151) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at co.cask.tephra.TransactionManager.replayLogs(TransactionManager.java:571) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at co.cask.tephra.TransactionManager.recoverState(TransactionManager.java:471) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at co.cask.tephra.TransactionManager.doStart(TransactionManager.java:218) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170) ~[com.google.guava.guava-13.0.1.jar:na] at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) ~[com.google.guava.guava-13.0.1.jar:na] at co.cask.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:175) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at co.cask.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:175) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_75] Caused by: java.io.EOFException: null at java.io.DataInputStream.readInt(DataInputStream.java:392) ~[na:1.7.0_75] at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2357) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na] at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2257) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na] at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2303) ~[hadoop-common-2.4.0.2.1.15.0-946.jar:na] at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:160) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:149) ~[co.cask.tephra.tephra-core-0.6.2.jar:na] ... 9 common frames omitted
Took the the latest tx.log that was tried to to be replayed and tested locally to decode the transaction edits in them.
It failed towards the end with the following exception
java.lang.RuntimeException: java.io.EOFException at com.google.common.base.Throwables.propagate(Throwables.java:160) at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:151) at co.cask.tephra.persist.HDFSTransactionStateStorageTest.testTxEdits(HDFSTransactionStateStorageTest.java:86) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2357) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2257) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2303) at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:161) at co.cask.tephra.persist.HDFSTransactionLog$LogReader.next(HDFSTransactionLog.java:149) ... 31 more
This is likely because the txlog was not closed properly causing the EOFException while calling next to decode. I noticed the position on the file before this Exception is close to the End of file.
Attaching the problematic txlog.