Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
kubernetes-operator-1.6.1
-
None
-
None
Description
The HA mode of flink-kubernetes-operator is being used. When one of the pods of flink-kubernetes-operator restarts, flink-kubernetes-operator switches the leader. However, some flinkdeployments have been in the JOB_STATUS=RECONCILING&LIFECYCLE_STATE=STABLE state for a long time.
Through the cmd "kubectl describe flinkdeployment xxx", can see the following error, but there are no exceptions in the flink-kubernetes-operator log.
Status: Cluster Info: Flink - Revision: b6d20ed @ 2023-12-20T10:01:39+01:00 Flink - Version: 1.14.0-GDC1.6.0 Total - Cpu: 7.0 Total - Memory: 30064771072 Error: {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Failed to load configuration","additionalMetadata":{},"throwableList":[{"type":"org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException","message":"java.lang.RuntimeException: Failed to load configuration","additionalMetadata":{}},{"type":"java.lang.RuntimeException","message":"Failed to load configuration","additionalMetadata":{}}]} Job Manager Deployment Status: READY Job Status: Job Id: cf44b5e73a1f263dd7d9f2c82be5216d Job Name: noah_stream_studio_1754211682_2218100380 Savepoint Info: Last Periodic Savepoint Timestamp: 0 Savepoint History: Start Time: 1705635107137 State: RECONCILING Update Time: 1709272530741 Lifecycle State: STABLE
version:
flink-kubernetes-operator: 1.6.1
flink: 1.14.0/1.15.2 (flinkdeployment 1200+)