Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40261

DirectTaskResult meta should not be counted into result size

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Spark Core
    • None

    Description

      This issue exists for a long time (since https://github.com/liuzqt/spark/commit/c33e55008239f417764d589c1366371d18331686)

      when calculating whether driver fetching result exceed `spark.driver.maxResultSize` limit, the whole serialized result task size is taken into account, including task metadata overhead size(accumUpdates) as well. However, the metadata should not be counted because they will be discarded by the driver immediately after being processed.

      This will lead to exception when running jobs with tons of task but actually return small results.

      Therefore we should only count `valueBytes` when calculating result size limit.

      cc joshrosen 

       

       

      Attachments

        Activity

          People

            liuzq12 Ziqi Liu
            liuzq12 Ziqi Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: