Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.
When I deploy a FlinkDeployment that fails during the startup, I get a "ReconciliationException: Could not observe latest savepoint information" (full stacktrace is attached).
I think the issue was introduced here: https://github.com/apache/flink-kubernetes-operator/pull/871. AbstractFlinkService.getLastCheckpoint now throws a ReconciliationException when a savepoint is not available, and SnapshotObserver.observeLatestCheckpoint doesn't handle it properly. I think having no savepoint is completely normal in some situations (e.g. a brand new job).