Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.9.1
-
None
-
None
Description
I suspect there is a bug in Yarn deletion task service, below is my repo steps:
- First let's set yarn.nodemanager.delete.debug-delay-sec=3600, that means when the app finished, the Binary/container folder will be deleted after 3600 seconds.
- when the application App1 (long running service) is running on machine machine1, and machine1 shutdown, ContainerManagerImpl#serviceStop() will be called -> ContainerManagerImpl#cleanUpApplicationsOnNMShutDown, and ApplicationFinishEvent will be sent, and then some delection tasks will be created, but be stored in DB and will be picked up to execute 3600 seconds.
- 100 seconds later, machine1 comes back, and the same app is assigned to run this this machine, container created and works well.
- then deleting task created in step 2 will be picked up to delete containers created in step 3 later.