Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35570

Shuffle file leak with external shuffle service enable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 3.1.2
    • None
    • Block Manager, Shuffle
    • None

    Description

      Unlike rdd block, external shuffle service doesn't offer a cleaning up of shuffle file. The cleaning up of shuffle file mainly rely on alive executors to response the request from context cleaner. As long as the executor exit, the shuffle file left will not be cleaned until application exits. For streaming application or long running application, disk may run out. 

      I'm confused that shuffle file was left like above while the lifecycle of rdd block was properly handled. Is there any difference between them? 

      Attachments

        Activity

          People

            Unassigned Unassigned
            guanziyue ZiyueGuan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: