Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.18.1
-
Fixes a bug where the leader election wasn't able to pick up leadership again after renewing the lease token caused a leadership loss. This required fabric8io:kubernetes-client to be upgraded from v6.6.2 to v6.9.0.
Description
FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which required an update of the k8s client to v6.9.0.
This Jira issue is about finding a solution in Flink 1.18 for the very same problem FLINK-34007 covered. It's a dedicated Jira issue because we want to unblock the release of 1.19 by resolving FLINK-34007.
Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in v6.6.2 which might prevent the leadership lost event being forwarded to the client (#5463). An initial proposal where the release call was handled in Flink's KubernetesLeaderElector didn't work due to the leadership lost event being triggered twice (see FLINK-34007 PR comment)
Attachments
Issue Links
- is caused by
-
FLINK-31997 Update to Fabric8 6.5.1+ in flink-kubernetes
- Closed
- split from
-
FLINK-34007 Flink Job stuck in suspend state after losing leadership in HA Mode
- Resolved
- links to