[HDFS-7704] DN heartbeat to Active NN may be blocked and expire if connection to Standby NN continues to time out. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.5.0
Fix Version/s: 2.7.0
Component/s: datanode, namenode
Labels:
None

Target Version/s:

2.7.0
Hadoop Flags:

Reviewed

Description

There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks and trySendErrorReport) which will wait for both of the actor threads to process this calls.
This calls are made with writeLock acquired.
When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, subsequent heartbeat response processing has to wait for the write lock. It eventually gets through, but takes too long and it blocks the next heartbeat.
In our HA cluster setup, the standby namenode was taking a long time to process the request.
Requesting improvement in datanode to make the above calls asynchronous since these reports don't have any specific
deadlines, so extra few seconds of delay should be acceptable.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7704.patch
29/Jan/15 15:43
14 kB
Rushabh Shah
HDFS-7704-v2.patch
02/Feb/15 16:41
18 kB
Rushabh Shah
HDFS-7704-v3.patch
03/Feb/15 20:04
18 kB
Rushabh Shah
HDFS-7704-v4.patch
03/Feb/15 23:47
18 kB
Rushabh Shah
HDFS-7704-v5.patch
09/Feb/15 16:47
24 kB
Rushabh Shah

Issue Links

breaks

HDFS-7916 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

Closed

Activity

People

Assignee:: Rushabh Shah

Reporter:: Rushabh Shah

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 29/Jan/15 15:30

Updated:: 11/Aug/15 23:37

Resolved:: 12/Feb/15 15:18