Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.8
Description
Currently we have R maps for M mappers, where R is number of reducers. For this reason many mappers writes to concurrent offheap data structure, loosing time on concurrency burden.
Let's add an option to create R * M maps, so that every mapper has dedicated map for every reducer. This will eliminate almost all concurrency overhead.
Design:
1) Every mapper works with it's own set of "remote" output maps;
2) These maps are essentially not "maps", but IO messages, which we fill up to certain threshold;
3) Once filled, message is sent to remote node.
4) Async shuffle thread is no longer need in this architecture.
As a result we decrease concurrency, removes slowdown from a single shuffle thread which is not able to send messages fast enough, and removes unnecessary intermediate sorting.
NB! Be careful with "combiner" case and with "external" execution.
Attachments
Issue Links
- is duplicated by
-
IGNITE-4283 Hadoop: implement "striped" shuffle mode.
- Closed
- links to