Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8966

Design a mechanism to ensure that temporary files created in tasks are cleaned up after failures

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Later
    • None
    • None
    • Spark Core
    • None

    Description

      It's important to avoid leaking temporary files, such as spill files created by the external sorter. Individual operators should still make an effort to clean up their own files / perform their own error handling, but I think that we should add a safety-net mechanism to track file creation on a per-task basis and automatically clean up leaked files.

      During tests, this mechanism should throw an exception when a leak is detected. In production deployments, it should log a warning and clean up the leak itself. This is similar to the TaskMemoryManager's leak detection and cleanup code.

      We may be able to implement this via a convenience method that registers task completion handlers with TaskContext.

      We might also explore techniques that will cause files to be cleaned up automatically when their file descriptors are closed (e.g. by calling unlink on an open file). These techniques should not be our last line of defense against file resource leaks, though, since they might be platform-specific and may clean up resources later than we'd like.

      Attachments

        Activity

          People

            Unassigned Unassigned
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: