Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18487

Add task completion listener to HashAggregate to avoid memory leak

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • SQL
    • None

    Description

      The methods such as Dataset.show and take use Limit (CollectLimitExec) which leverages SparkPlan.executeTake to efficiently collect required number of elements back to the driver.

      However, under wholestage codege, we usually release resources after all elements are consumed (e.g., HashAggregate). In this case, we will not release the resources and cause memory leak with Dataset.show, for example.

      We can add task completion listener to HashAggregate to avoid the memory leak.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: