Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.14.3, 1.15.0
Description
The ResourceManagerServiceImpl lifecycle can lead to exceptions when calling ResourceManagerServiceImpl.deregisterApplication. The problem arises when the DispatcherResourceManagerComponent is shutdown before the ResourceManagerServiceImpl gains leadership or while it is starting the ResourceManager.
One problem is that deregisterApplication returns an exceptionally completed future if there is no leading ResourceManager.
Another problem is that if there is a leading ResourceManager, then it can still be the case that it has not been started yet. If this is the case, then ResourceManagerGateway.deregisterApplication will be discarded. The reason for this behaviour is that we create a ResourceManager in one Runnable and only start it in another. Due to this there can be the deregisterApplication call that gets the lock in between.
I'd suggest to correct the lifecycle and contract of the ResourceManagerServiceImpl.deregisterApplication.
Please note that due to this problem, the error reporting of this method has been suppressed. See FLINK-25885 for more details.
Attachments
Issue Links
- is related to
-
FLINK-23240 ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
- Closed
-
FLINK-25885 ClusterEntrypointTest.testWorkingDirectoryIsDeletedIfApplicationCompletes failed on azure
- Closed
- links to