Details
-
Documentation
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.2.0, 1.2.1
-
None
Description
Some error about the section Cluster Launch Scripts in the http://spark.apache.org/docs/latest/spark-standalone.html
In the description about the property spark.worker.cleanup.enabled, it states that all the directory under the work dir will be removed whether the application is running or not.
After checking the implementation in the code level, I found that only the stopped application dirs would be removed. So the description in the document is incorrect.
the code implementation in worker.scala
WorkDirCleanup
case WorkDirCleanup => // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor val cleanupFuture = concurrent.future { val appDirs = workDir.listFiles() if (appDirs == null) { throw new IOException("ERROR: Failed to list files in " + appDirs) } appDirs.filter { dir => // the directory is used by an application - check that the application is not running // when cleaning up val appIdFromDir = dir.getName val isAppStillRunning = executors.values.map(_.appId).contains(appIdFromDir) dir.isDirectory && !isAppStillRunning && !Utils.doesDirectoryContainAnyNewFiles(dir, APP_DATA_RETENTION_SECS) }.foreach { dir => logInfo(s"Removing directory: ${dir.getPath}") Utils.deleteRecursively(dir) } } cleanupFuture onFailure { case e: Throwable => logError("App dir cleanup failed: " + e.getMessage, e) }