Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25103

CompletionIterator may delay GC of completed resources

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.0.1, 2.1.0, 2.2.0, 2.3.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:

      Description

      while working on SPARK-22713 , I fund (and partially fixed) a scenario in which an iterator is already exhausted but still holds a reference to some resources that can be GCed at this point.

      However, these resources can not be GCed because of this reference.

      the specific fix applied in SPARK-22713 was to wrap the iterator with a CompletionIterator that cleans it when exhausted, thing is that it's quite easy to get this wrong by closing over local variables or this reference in the cleanup function itself.

      I propose solving this by modifying CompletionIterator to discard references to the wrapped iterator and cleanup function once exhausted.

       

      • a dive into the code showed that most CompletionIterators are eventually used by 
        org.apache.spark.scheduler.ShuffleMapTask#runTask

        which does:

      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])

      looking at 

      org.apache.spark.shuffle.ShuffleWriter#write

      implementations, it seems all of them first exhaust the iterator and then perform some kind of post-processing: i.e. merging spills, sorting, writing partitions files and then concatenating them into a single file... bottom line the Iterator may actually be 'sitting' for some time after being exhausted.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              eyalfa Eyal Farago
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: