[YARN-3238] Connection timeouts to nodemanagers are retried at multiple levels - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0, 2.6.1, 3.0.0-alpha1
Component/s: None
Labels:
- 2.6.1-candidate

Target Version/s:

2.7.0

Description

The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-3238.001.patch
20/Feb/15 23:34
1.0 kB
Jason Darrell Lowe

Issue Links

is related to

YARN-3944 Connection refused to nodemanagers are retried at multiple levels

Resolved

YARN-4414 Nodemanager connection errors are retried at multiple levels

Closed

relates to

YARN-3554 Default value for maximum nodemanager connect wait time is too high

Closed

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 20/Feb/15 23:22

Updated:: 07/Feb/20 13:26

Resolved:: 22/Feb/15 00:08