Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17350

StreamTask should always fail immediately on failures in synchronous part of a checkpoint

    XMLWordPrintableJSON

    Details

    • Release Note:
      Hide
      Failures in synchronous part of checkpointing (like an exceptions thrown by an operator) will fail it's Task (and job) immediately, regardless of the configuration parameters. Since Flink 1.5 such failures could be ignored by setting `setTolerableCheckpointFailureNumber(...)` or its deprecated `setFailTaskOnCheckpointError(...)` predecessor. Now both options will only affect asynchronous failures.
      Show
      Failures in synchronous part of checkpointing (like an exceptions thrown by an operator) will fail it's Task (and job) immediately, regardless of the configuration parameters. Since Flink 1.5 such failures could be ignored by setting `setTolerableCheckpointFailureNumber(...)` or its deprecated `setFailTaskOnCheckpointError(...)` predecessor. Now both options will only affect asynchronous failures.

      Description

      This bugs also Affects 1.5.x branch.

      As described in point 1 here: https://issues.apache.org/jira/browse/FLINK-17327?focusedCommentId=17090576&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090576

      setTolerableCheckpointFailureNumber(...) and its deprecated setFailTaskOnCheckpointError(...) predecessor are implemented incorrectly. Since Flink 1.5 (https://issues.apache.org/jira/browse/FLINK-4809) they can lead to operators (and especially sinks with an external state) end up in an inconsistent state. That's also true even if they are not used, because of another issue: FLINK-17351

      If we mix this with intermittent external system failure. Sink reports an exception, transaction was lost/aborted, Sink is in failed state, but if there will be a happy coincidence that it manages to accept further records, this exception can be lost and all of the records in those failed checkpoints will be lost forever as well.

      For details please check FLINK-17327.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pnowojski Piotr Nowojski
                Reporter:
                pnowojski Piotr Nowojski
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: