Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10377

Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete

    XMLWordPrintableJSON

    Details

      Description

      The precondition checkState(pendingTransactionIterator.hasNext(), "checkpoint completed, but no transaction pending"); in TwoPhaseCommitSinkFunction.notifyCheckpointComplete() seems too strict, because checkpoints can overtake checkpoints and will fail the precondition. In this case the commit was already performed by the first notification and subsumes the late checkpoint. I think the check can be removed.

      edit:
      As reported by a user on the user mailing list, TwoPhaseCommitSinkFunction#notifyCheckpointComplete can fail with the following exception:

      java.lang.RuntimeException: Error while confirming checkpoint
          at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.IllegalStateException: checkpoint completed, but no transaction pending
          at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
          at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:267)
          at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
          at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
          at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
          ... 5 more
      

      This can happen in the following scenario:

      1. savepoint is triggered
      2. checkpoint is triggered
      3. checkpoint completes (but it doesn't subsume the savepoint, because checkpoints subsume only other checkpoints).
      4. savepoint completes

      In this case, TwoPhaseCommitSinkFunction receives first notification that the later checkpoint completed, it commits both savepoint and the checkpoint. Later when savepoint notifyCheckpointComplete arrives, the above error will occur.

      Possible trivial fix is to remove that failing checkState.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                srichter Stefan Richter
                Reporter:
                srichter Stefan Richter
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m