[SPARK-751] Consolidate shuffle files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.8.1
Component/s: None
Labels:
None

Description

Right now on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

This can be pretty high when there are lots of mappers and reducers (e.g. 1k map * 1k reduce = 1 million files for a single shuffle). The high number can cripple the file system and significantly slow the system down.

We should cut this number down to O(R) instead of O(M*R).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Consolidating Shuffle Files in Spark.pdf
03/Jul/13 01:46
378 kB
Jason Dai

Activity

People

Assignee:: Jason Dai

Reporter:: Reynold Xin

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 21/May/13 00:31

Updated:: 14/Nov/13 18:29

Resolved:: 14/Nov/13 18:29