Spark Standalone worker cleanup finds directories to remove with a listFiles command
This includes both application directories and driver directories from cluster mode submitted applications.
A directory is considered to not be part of a running app if the worker does not have an executor with a matching ID.
If a driver has been started on a node, but all of the executors are on other nodes, the worker running the driver will always assume that the driver directory is not part of a running app.
Consider a two node spark cluster with Worker A and Worker B where each node has a single core available. We submit our application in deploy mode cluster, the driver begins running on Worker A while the Executor starts on B.
Worker A has a cleanup triggered and looks and finds it has a directory
Worker A check's it's executor list and finds no entries which match this since it has no corresponding executors for this application. Worker A then removes the directory even though it may still be actively running.
I think this could be fixed by modifying line 432 to be
I'll run a test and submit a PR soon.