[SPARK-19659] Fetch big blocks to disk when shuffle-read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: Shuffle, Spark Core
Labels:
None

Description

Currently the whole block is fetched into memory(offheap by default) when shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can be large when skew situations. If OOM happens during shuffle read, job will be killed and users will be notified to "Consider boosting spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more memory can resolve the OOM. However the approach is not perfectly suitable for production environment, especially for data warehouse.

Using Spark SQL as data engine in warehouse, users hope to have a unified parameter(e.g. memory) but less resource wasted(resource is allocated but not used),

It's not always easy to predict skew situations, when happen, it make sense to fetch remote blocks to disk for shuffle-read, rather than
kill the job because of OOM. This approach is mentioned during the discussion in ~~SPARK-3019~~, by sandyr and mridulm80

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SPARK-19659-design-v1.pdf
27/Feb/17 14:53
63 kB
Jin Xing
SPARK-19659-design-v2.pdf
08/Mar/17 05:19
60 kB
Jin Xing

Issue Links

is blocked by

SPARK-19937 Collect metrics of block sizes when shuffle.

Resolved

is duplicated by

SPARK-6238 Support shuffle where individual blocks might be > 2G

Resolved

SPARK-13510 Shuffle may throw FetchFailedException: Direct buffer memory

Closed

is related to

SPARK-21253 Cannot fetch big blocks to disk

Resolved

SPARK-26590 make fetch-block-to-disk backward compatible

Resolved

links to

[Github] Pull Request #16989 (jinxing64)

[Github] Pull Request #18117 (cloud-fan)

[Github] Pull Request #18467 (zsxwing)

(3 links to)

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Jin Xing

Reporter:: Jin Xing

Votes:: 0 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 19/Feb/17 10:04

Updated:: 17/May/20 18:30

Resolved:: 10/Jul/17 01:19