Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10377

Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete

    XMLWordPrintableJSON

Details

    Description

      The precondition checkState(pendingTransactionIterator.hasNext(), "checkpoint completed, but no transaction pending"); in TwoPhaseCommitSinkFunction.notifyCheckpointComplete() seems too strict, because checkpoints can overtake checkpoints and will fail the precondition. In this case the commit was already performed by the first notification and subsumes the late checkpoint. I think the check can be removed.

      edit:
      As reported by a user on the user mailing list, TwoPhaseCommitSinkFunction#notifyCheckpointComplete can fail with the following exception:

      java.lang.RuntimeException: Error while confirming checkpoint
          at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.IllegalStateException: checkpoint completed, but no transaction pending
          at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
          at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:267)
          at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
          at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
          at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
          ... 5 more
      

      This can happen in the following scenario:

      1. savepoint is triggered
      2. checkpoint is triggered
      3. checkpoint completes (but it doesn't subsume the savepoint, because checkpoints subsume only other checkpoints).
      4. savepoint completes

      In this case, TwoPhaseCommitSinkFunction receives first notification that the later checkpoint completed, it commits both savepoint and the checkpoint. Later when savepoint notifyCheckpointComplete arrives, the above error will occur.

      Possible trivial fix is to remove that failing checkState.

      Attachments

        Issue Links

          Activity

            People

              srichter Stefan Richter
              srichter Stefan Richter
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m