[HDFS-15419] RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: configuration, hdfs-client, rbf
Labels:
- pull-request-available

Description

When cluster is unavailable, router -> namenode communication will only retry once without any time interval, that is not reasonable.

For example, in my company, which has several hdfs clusters with more than 1000 nodes, we have encountered this problem. In some cases, the cluster becomes unavailable briefly for about 10 or 30 seconds, at the same time, almost all rpc requests to router failed because router only retry once without time interval.

It's better for us to enhance the router retry strategy, to retry **communicate with NN using configurable time interval and max retry times.

Attachments

Issue Links

links to

GitHub Pull Request #2082

PR 2082

Activity

People

Assignee:: Unassigned

Reporter:: bhji123

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Jun/20 04:50

Updated:: 16/Sep/20 20:19

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m