[HADOOP-2054] Improve memory model for map-side sorts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

MapTask#MapOutputBuffer uses a plain-jane DataOutputBuffer which defaults to a buffer of size 32-bytes, and the DataOutputBuffer#write call doubles the underlying byte-array when it needs more space.

However for maps which output any decent amount of data (e.g. 128MB in examples/Sort.java) this means the buffer grows painfully slowly from 2^6 to 2^28, and each time this results in a new array being created, followed by an array-copy:

    public void write(DataInput in, int len) throws IOException {
      int newcount = count + len;
      if (newcount > buf.length) {
        byte newbuf[] = new byte[Math.max(buf.length << 1, newcount)];
        System.arraycopy(buf, 0, newbuf, 0, count);
        buf = newbuf;
      }
      in.readFully(buf, count, len);
      count = newcount;
    }

I reckon we could do much better in the MapTask, specifically...

For e.g. we start with a buffer of size 1/4KB and quadruple, rather than double, upto, say 4/8/16MB. Then we resume doubling (or less).

This means that it quickly ramps up to minimize no. of System.arrayCopy calls and small-sized buffers to GC; and later start doubling to ensure we don't ramp-up too quickly to minimize memory wastage due to fragmentation.

Of course, this issue is about benchmarking and figuring if all this is worth it, and, if so, what are the right set of trade-offs to make.

Thoughts?

Attachments

Issue Links

is blocked by

HADOOP-2053 OutOfMemoryError : Java heap space errors in hadoop 0.14

Resolved

is part of

HADOOP-2919 Create fewer copies of buffer data during sort/spill

Closed

Activity

People

Assignee:: Christopher Douglas

Reporter:: Arun Murthy

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 15/Oct/07 10:33

Updated:: 08/Jul/09 16:52

Resolved:: 31/Mar/08 22:54