[MAPREDUCE-4797] LocalContainerAllocator can loop forever trying to contact the RM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.23.3, 2.0.1-alpha
Fix Version/s: 2.0.3-alpha, 0.23.5
Component/s: applicationmaster
Labels:
None

Target Version/s:

2.0.3-alpha, 0.23.5

Description

If LocalContainerAllocator has trouble communicating with the RM it can end up retrying forever if the nature of the error is not a YarnException.

This can be particulary bad if the connection went down because the cluster was reset such that the RM and NM have lost track of the process and therefore nothing else will eventually kill the process. In this scenario, the looping AM continues to pelt the RM with connection requests every second using a stale token, and the RM logs the SASL exceptions over and over.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-4797.patch
14/Nov/12 21:35
6 kB
Jason Darrell Lowe
MAPREDUCE-4797.patch
14/Nov/12 21:59
6 kB
Jason Darrell Lowe

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Nov/12 02:15

Updated:: 03/Sep/14 23:17

Resolved:: 14/Nov/12 23:09