[SPARK-30713] Respect mapOutputSize in memory in adaptive execution - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Currently, Spark adaptive execution use the MapOutputStatistics information to adjust the plan dynamically, but this MapOutputSize does not respect the compression factor. So there are cases that the original SparkPlan is `SortMergeJoin`, but the Plan after adaptive adjustment was changed to `BroadcastHashJoin`, but this `BroadcastHashJoin` might causing OOMs due to inaccurate estimation.

Also, if the shuffle implementation is local shuffle(intel Spark-Adaptive execution impl), then in some cases, it will cause `Too large Frame` exception.

So I propose to respect the compression factor in adaptive execution, or use `dataSize` metrics in `ShuffleExchangeExec` in adaptive execution.

Attachments

Issue Links

is related to

SPARK-24914 totalSize is not a good estimate for broadcast joins

In Progress

Activity

People

Assignee:: Unassigned

Reporter:: liupengcheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Feb/20 13:12

Updated:: 15/May/24 08:21