[MAPREDUCE-6303] Read timeout when retrying a fetch error can be fatal to a reducer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0, 2.6.1, 3.0.0-alpha1
Component/s: None
Labels:
- 2.6.1-candidate

Target Version/s:

2.7.0
Hadoop Flags:

Reviewed

Description

If a reducer encounters an error trying to fetch from a node then encounters a read timeout when trying to re-establish the connection then the reducer can fail. The read timeout exception can leak to the top of the Fetcher thread which will cause the reduce task to teardown. This type of error can repeat across reducer attempts causing jobs to fail due to a single bad node.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6303.001.patch
02/Apr/15 13:47
7 kB
Jason Darrell Lowe

Issue Links

breaks

MAPREDUCE-6957 shuffle hangs after a node manager connection timeout

Resolved

is broken by

MAPREDUCE-5891 Improved shuffle error handling across NM restarts

Closed

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 01/Apr/15 21:32

Updated:: 25/Oct/19 20:25

Resolved:: 02/Apr/15 19:00