Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22976

Worker cleanup can remove running driver directories



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 2.3.0
    • Component/s: Deploy, Spark Core
    • Labels:


      Spark Standalone worker cleanup finds directories to remove with a listFiles command

      This includes both application directories and driver directories from cluster mode submitted applications.

      A directory is considered to not be part of a running app if the worker does not have an executor with a matching ID.


            val appIds = executors.values.map(_.appId).toSet
            val isAppStillRunning = appIds.contains(appIdFromDir)

      If a driver has been started on a node, but all of the executors are on other nodes, the worker running the driver will always assume that the driver directory is not part of a running app.

      Consider a two node spark cluster with Worker A and Worker B where each node has a single core available. We submit our application in deploy mode cluster, the driver begins running on Worker A while the Executor starts on B.

      Worker A has a cleanup triggered and looks and finds it has a directory


      Worker A check's it's executor list and finds no entries which match this since it has no corresponding executors for this application. Worker A then removes the directory even though it may still be actively running.

      I think this could be fixed by modifying line 432 to be

            val appIds = executors.values.map(_.appId).toSet ++ drivers.values.map(_.driverId)

      I'll run a test and submit a PR soon.




            • Assignee:
              rspitzer Russell Spitzer
              rspitzer Russell Spitzer
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: