Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.14.0
Description
When testing DFS changelog implementation in FLINK-23279 and enabling it for all tests,
UnalignedCheckpointStressITCase crashed with the following exception
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.433 s <<< FAILURE! - in org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase [ERROR] runStressTest(org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase) Time elapsed: 17.663 s <<< ERROR! java.io.UncheckedIOException: java.nio.file.NoSuchFileException: /tmp/junit7860347244680665820/435237 d57439f2ceadfedba74dadd6fa/chk-16 at java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88) at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:546) at org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.discoverRetainedCheckpoint(UnalignedCheckpointStressITCase.java:288) at org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.runAndTakeExternalCheckpoint(UnalignedCheckpointStressITCase.java:261) at org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase.runStressTest(UnalignedCheckpointStressITCase.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java :384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Caused by: java.nio.file.NoSuchFileException: /tmp/junit7860347244680665820/435237d57439f2ceadfedba74 dadd6fa/chk-16 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.nio.file.Files.readAttributes(Files.java:1737) at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372) at java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:84)
The referred checkpoint 16 was aborted and scheduled for deletion.
But the test does not wait for it to complete and proceeds to file listing.
I think this problem is also present in UnalignedCheckpointRescaleITCase (FLINK-22197) and probably in CoordinatedSourceRescaleITCase(FLINK-23577).
Patch to demonstrate it: https://github.com/rkhachatryan/flink/tree/f23647-demo
Corresponding failure
Attachments
Issue Links
- causes
-
FLINK-24938 Checkpoint cleaner is closed before checkpoints are discarded
- Closed
- links to