[YARN-5718] TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha2
Component/s: resourcemanager, timelineclient
Labels:
None

Target Version/s:

3.0.0-alpha2
Hadoop Flags:

Incompatible change

Description

In one HA cluster, after NN failed over, we noticed that job is getting failed as TimelineClient failed to retry connection to proper NN. This is because we are overwrite hdfs client settings that hard code retry policy to be enabled that conflict NN failed-over case - hdfs client should fail fast so can retry on another NN.
We shouldn't assume any retry policy for hdfs client at all places in YARN. This should keep consistent with HDFS settings that has different retry polices in different deployment case. Thus, we should clean up these hard code settings in YARN, include: FileSystemTimelineWriter, FileSystemRMStateStore and FileSystemNodeLabelsStore.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-5718-v2.patch
12/Oct/16 11:15
9 kB
Junping Du
YARN-5718-v2.1.patch
12/Oct/16 18:49
9 kB
Junping Du
YARN-5718.patch
10/Oct/16 15:14
4 kB
Junping Du

Issue Links

is related to

YARN-5748 Backport YARN-5718 to branch-2

Patch Available

Activity

People

Assignee:: Junping Du

Reporter:: Junping Du

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Oct/16 15:07

Updated:: 25/Oct/19 20:25

Resolved:: 18/Oct/16 18:07