[FLINK-36673] Operator is not properly handling failed deployments without savepoints - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Kubernetes Operator
Labels:
None

Description

I noticed an issue after upgrading Flink Kubernetes Operator from 1.9 to 1.10.

When I deploy a FlinkDeployment that fails during the startup, I get a "ReconciliationException: Could not observe latest savepoint information" (full stacktrace is attached).

I think the issue was introduced here: https://github.com/apache/flink-kubernetes-operator/pull/871. AbstractFlinkService.getLastCheckpoint now throws a ReconciliationException when a savepoint is not available, and SnapshotObserver.observeLatestCheckpoint doesn't handle it properly. I think having no savepoint is completely normal in some situations (e.g. a brand new job).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

stacktrace.txt
07/Nov/24 23:47
6 kB
Yaroslav Tkachenko

Activity

People

Assignee:: Unassigned

Reporter:: Yaroslav Tkachenko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Nov/24 23:51

Updated:: 09/Nov/24 04:41