Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47910

Memory leak when interrupting shuffle write using zstd compression

    XMLWordPrintableJSON

Details

    Description

      When spark.sql.execution.interruptOnCancel=true and spark.io.compression.codec=zstd, a memory leak was found when tasks were cancelled at specific times. The reason for this is that cancelling a task interrupts the shuffle write, which then calls org.apache.spark.storage.DiskBlockObjectWriter#closeResources. this process then only closes the ManualCloseOutputStream, which is wrapped with this ZstdInputStreamNoFinalizer will not be closed. Moreover, ZstdInputStreamNoFinalizer doesn't implement Finalizer so it won't be reclaimed by GC automatically. 
       

      Attachments

        Issue Links

          Activity

            People

              JacobZheng JacobZheng
              JacobZheng JacobZheng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: