Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10946

Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis

    XMLWordPrintableJSON

    Details

      Description

      The test failed 3 times in total during the overall night build, but succeeded 2 times after restart. It did not fail locally for me.

      Here is a travis build to run it 500 times (reproducable):

      https://travis-ci.org/azagrebin/flink/builds/457375100

      2018-11-20 11:59:54,673 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (2/2) (e06b7022f2f2154f2a84206f068ff1fd).
      2018-11-20 11:59:54,701 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (1/2).
      java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
      	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
      	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
      	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
      	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
      	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
      	at java.lang.Thread.run(Thread.java:748)
      2018-11-20 11:59:54,702 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: Custom Source -> Timestamps/Watermarks (2/2) (09528d6ab0e1ee87ed21e78139682b18) [CANCELED]
      2018-11-20 11:59:54,703 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: Custom Source -> Timestamps/Watermarks (1/2) (c98146380e7f559ca18a183f4c0ef12d) [CANCELED]
      2018-11-20 11:59:54,721 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_7d23c6ceabda05a587f0217e44f21301__2_2__uuid_59f43f20-768f-4117-9a3f-4a101a32b1d2.
      2018-11-20 11:59:54,724 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_7d23c6ceabda05a587f0217e44f21301__1_2__uuid_f3ce6abc-52dc-4fa1-820e-e015daca418c.
      2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8).
      2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8).
      2018-11-20 11:59:54,769 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9).
      2018-11-20 11:59:54,770 INFO  org.apache.flink.runtime.taskmanager.Task                     - TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,770 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9).
      2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307).
      2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307).
      2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a).
      2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a).
      2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939).
      2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939).
      2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319).
      2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319).
      2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75).
      2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75) switched from RUNNING to CANCELING.
      2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75).
      2018-11-20 11:59:54,857 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_WindowOperator_0b63e7dd9fb1948bf052174673e64274__1_2__uuid_f01e3e36-565e-4680-b7d8-1f30e6a64474.
      2018-11-20 11:59:54,742 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (1/2).
      java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
      	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
      	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
      	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
      	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
      	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
      	at java.lang.Thread.run(Thread.java:748)
      2018-11-20 11:59:54,871 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_5271c210329e73bd743f3227edfb3b71__1_2__uuid_0f79c71c-ffe0-4fb6-a16e-a7a7f1c8490d.
      2018-11-20 11:59:54,747 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (2/2).
      java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
      	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
      	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
      	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
      	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
      	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
      	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
      	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
      	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
      	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
      	at java.lang.Thread.run(Thread.java:748)
      2018-11-20 11:59:54,875 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_5271c210329e73bd743f3227edfb3b71__2_2__uuid_f09e971f-8542-4f0b-9abe-644d702f235f.
      2018-11-20 11:59:54,873 INFO  org.apache.flink.runtime.taskmanager.Task                     - ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (1/2) (3e5842c071706a82194c67143f6f4a60) switched from CANCELING to CANCELED.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                azagrebin Andrey Zagrebin
                Reporter:
                azagrebin Andrey Zagrebin
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: