Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.28.0, 1.0.0
-
None
Description
When the fetching process hangs forever due to reasons such as HDFS issues, Mesos containerizer would attempt to destroy the container and kill the executor after --executor_registration_timeout. However this reliably fails for us: the executor would be killed by the launcher destroy and the container would be destroyed but the agent would never find out that the executor is terminated thus leaving the task in the STAGING state forever.