[MAPREDUCE-240] Improve the shuffle phase by using the "connection: keep-alive" and doing batch transfers of files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

We should do transfers of map outputs at the granularity of total-bytes-transferred rather than the current way of transferring a single file and then closing the connection to the server. A single TaskTracker might have a couple of map output files for a given reduce, and we should transfer multiple of them (upto a certain total size) in a single connection to the TaskTracker. Using HTTP-1.1's keep-alive connection would help since it would keep the connection open for more than one file transfer. We should limit the transfers to a certain size so that we don't hold up a jetty thread indefinitely (and cause timeouts for other clients).
Overall, this should give us improved performance.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hadoop-1338-v1.patch
03/Feb/09 11:24
58 kB
Jothi Padmanabhan
hadoop-1338-v2.patch
09/Feb/09 12:33
65 kB
Jothi Padmanabhan

Issue Links

is cloned by

HADOOP-3656 fetcher should re-use connection when it needs to fetch multiple segments from the same task tracker

Closed

Activity

People

Assignee:: Jothi Padmanabhan

Reporter:: Devaraj Das

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 08/May/07 06:17

Updated:: 11/Sep/09 04:25

Resolved:: 11/Sep/09 04:25