Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.17.0, 1.15.4, 1.16.2
Description
We're seeing a test failure in KubernetesHighAvailabilityRecoverFromSavepointITCase due to a deadlock:
2023-02-01T18:53:35.5540322Z "ForkJoinPool-1-worker-1" #14 daemon prio=5 os_prio=0 tid=0x00007f68ecb18000 nid=0x43dd1 waiting on condition [0x00007f68c1711000] 2023-02-01T18:53:35.5540900Z java.lang.Thread.State: TIMED_WAITING (parking) 2023-02-01T18:53:35.5541272Z at sun.misc.Unsafe.park(Native Method) 2023-02-01T18:53:35.5541932Z - parking to wait for <0x00000000d14d7b60> (a java.util.concurrent.CompletableFuture$Signaller) 2023-02-01T18:53:35.5542496Z at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 2023-02-01T18:53:35.5543088Z at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1709) 2023-02-01T18:53:35.5543672Z at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313) 2023-02-01T18:53:35.5544240Z at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1788) 2023-02-01T18:53:35.5544801Z at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 2023-02-01T18:53:35.5545632Z at org.apache.flink.kubernetes.highavailability.KubernetesHighAvailabilityRecoverFromSavepointITCase.testRecoverFromSavepoint(KubernetesHighAvailabilityRecoverFromSavepointITCase.java:113) 2023-02-01T18:53:35.5546409Z at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
The build failure happens on 1.16. I'm adding 1.17 and 1.15 as fixVersions as well because it might be due to some recent changes which were introduced with FLINK-30462 and/or FLINK-30474
Attachments
Issue Links
- is caused by
-
FLINK-30474 DefaultMultipleComponentLeaderElectionService triggers HA backend change even if it's not the leader
- Closed
- links to