[MAPREDUCE-2308] Sort buffer size (io.sort.mb) is limited to < 2 GB - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.20.1, 0.20.2, 0.21.0
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Cloudera CDH3b3 (0.20.2+)

Description

I have MapReduce jobs that use a large amount of per-task memory, because the algorithm I'm using converges faster if more data is together on a node. I have my JVM heap size set at 3200 MB, and if I use the popular rule of thumb that io.sort.mb should be ~70% of that, I get 2240 MB. I rounded this down to 2048 MB, but map tasks crash with :

java.io.IOException: Invalid "io.sort.mb": 2048
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:790)
        ...

MapTask.MapOutputBuffer implements its buffer with a byte[] of size io.sort.mb (in bytes), and is sanity checking the size before allocating the array. The problem is that Java arrays can't have more than 2^31 - 1 elements (even with a 64-bit JVM), and this is a limitation of the Java language specificiation itself. As memory and data sizes grow, this would seem to be a crippling limtiation of Java.

It would be nice if this ceiling were documented, and an error issued sooner, e.g. in jobtracker startup upon reading the config. Going forward, we may need to implement some array of arrays hack for large buffers.

Attachments

Issue Links

is duplicated by

MAPREDUCE-5705 mapreduce.task.io.sort.mb hardcoded cap at 2047

Reopened

Activity

People

Assignee:: Unassigned

Reporter:: Jay Hacker

Votes:: 1 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Created:: 08/Feb/11 14:58

Updated:: 05/Jan/14 03:12