Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-36673

Operator is not properly handling failed deployments without savepoints

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Kubernetes Operator
    • None

    Description

      I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.

      When I deploy a FlinkDeployment that fails during the startup, I get a "ReconciliationException: Could not observe latest savepoint information" (full stacktrace is attached). 

      I think the issue was introduced here: https://github.com/apache/flink-kubernetes-operator/pull/871. AbstractFlinkService.getLastCheckpoint now throws a ReconciliationException when a savepoint is not available, and SnapshotObserver.observeLatestCheckpoint doesn't handle it properly. I think having no savepoint is completely normal in some situations (e.g. a brand new job). 

      Attachments

        1. stacktrace.txt
          6 kB
          Yaroslav Tkachenko

        Activity

          People

            Unassigned Unassigned
            sap1ens Yaroslav Tkachenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: