[IGNITE-4270] Hadoop: optionally stripe mapper output for every partition. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.8
Fix Version/s: 2.0
Component/s: hadoop
Labels:
- performance

Description

Currently we have R maps for M mappers, where R is number of reducers. For this reason many mappers writes to concurrent offheap data structure, loosing time on concurrency burden.
Let's add an option to create R * M maps, so that every mapper has dedicated map for every reducer. This will eliminate almost all concurrency overhead.

Design:
1) Every mapper works with it's own set of "remote" output maps;
2) These maps are essentially not "maps", but IO messages, which we fill up to certain threshold;
3) Once filled, message is sent to remote node.
4) Async shuffle thread is no longer need in this architecture.

As a result we decrease concurrency, removes slowdown from a single shuffle thread which is not able to send messages fast enough, and removes unnecessary intermediate sorting.

NB! Be careful with "combiner" case and with "external" execution.

Attachments

Issue Links

is duplicated by

IGNITE-4283 Hadoop: implement "striped" shuffle mode.

Closed

links to

GitHub Pull Request #1334

Activity

People

Assignee:: Vladimir Ozerov

Reporter:: Vladimir Ozerov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Nov/16 08:01

Updated:: 09/Dec/16 09:02

Resolved:: 09/Dec/16 09:02