[MAPREDUCE-4755] Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.0.0-alpha1
Fix Version/s: None
Component/s: None
Labels:
- optimization
- sort
Environment:

Ubuntu 12.10 x86_64 (Bulldozer 8-core)

Description

The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.

With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.

The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

Attachments

Issue Links

relates to

HAMA-559 Add a spilling message queue

Resolved

MAPREDUCE-3235 Improve CPU cache behavior in map side sort

Open

Activity

People

Assignee:: Gopal Vijayaraghavan

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 27/Oct/12 08:37

Updated:: 12/May/16 18:22

Resolved:: 06/Oct/14 19:40