[MAPREDUCE-3235] Improve CPU cache behavior in map side sort - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.23.0
Fix Version/s: None
Component/s: performance, task
Labels:
None

Description

When running oprofile on a terasort workload, I noticed that a large amount of CPU usage was going to MapTask$MapOutputBuffer.compare. Upon disassembling this and looking at cycle counters, most of the cycles were going to memory loads dereferencing into the array of key-value data – implying expensive cache misses. This can be avoided as follows:

rather than simply swapping indexes into the kv array, swap the entire meta entries in the meta array. Swapping 16 bytes is only negligibly slower than swapping 4 bytes. This requires adding the value-length into the meta array, since we used to rely on the previous-in-the-array meta entry to determine this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
introduce an interface which allows key types to provide a 4-byte comparison proxy. For string keys, this can simply be the first 4 bytes of the string. The idea is that, if stringCompare(key1.proxy(), key2.proxy()) != 0, then compare(key1, key2) should have the same result. If the proxies are equal, the normal comparison method is used. We then include the 4-byte proxy as part of the metadata entry, so that for many cases the indirection into the data buffer can be avoided.

On a terasort benchmark, these optimizations plus an optimization to WritableComparator.compareBytes dropped the aggregate mapside CPU millis by 40%, and the compare() routine mostly dropped off the oprofile results.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hashed-sort-MAPREDUCE-3235.patch
08/Nov/12 13:26
10 kB
Gopal Vijayaraghavan
map_sort_perf.diff
15/Jun/12 06:28
8 kB
Hal Mo
mr-3235-poc.txt
20/Oct/11 19:47
12 kB
Todd Lipcon

Issue Links

is related to

MAPREDUCE-1639 Grouping using hashing instead of sorting

Open

MAPREDUCE-4755 Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Resolved

Sub-Tasks

In map-side sort, swap entire meta entries instead of indexes for better cache performance

Closed

Sandy Ryza

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 38 Start watching this issue

Dates

Created:: 20/Oct/11 17:30

Updated:: 15/Aug/13 18:47