[SPARK-1912] Compression memory issue during reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.2, 1.0.1, 1.1.0
Component/s: Spark Core
Labels:
None

Target Version/s:

0.9.2, 1.0.1, 1.1.0

Description

When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block.
Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem.
Actually reducer reads the shuffle blocks one by one, so why we create compression instance at the first time? Can we do it lazily that when a block is first read, create compression instance for it.

Attachments

Issue Links

links to

[Github] Pull Request #2179 (rxin)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Wenchen Fan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/May/14 03:54

Updated:: 28/Aug/14 08:07

Resolved:: 03/Jun/14 20:19