[SPARK-40261] DirectTaskResult meta should not be counted into result size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: Spark Core
Labels:
None

Description

This issue exists for a long time (since https://github.com/liuzqt/spark/commit/c33e55008239f417764d589c1366371d18331686)

when calculating whether driver fetching result exceed `spark.driver.maxResultSize` limit, the whole serialized result task size is taken into account, including task metadata overhead size(accumUpdates) as well. However, the metadata should not be counted because they will be discarded by the driver immediately after being processed.

This will lead to exception when running jobs with tons of task but actually return small results.

Therefore we should only count `valueBytes` when calculating result size limit.

cc joshrosen

Attachments

Issue Links

links to

[Github] Pull Request #37713 (liuzqt)

Activity

People

Assignee:: Ziqi Liu

Reporter:: Ziqi Liu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Aug/22 20:41

Updated:: 01/Sep/22 00:39

Resolved:: 01/Sep/22 00:39