[HDFS-915] Hung DN stalls write pipeline for far longer than its timeout - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.20.1
Fix Version/s: None
Component/s: hdfs-client
Labels:
None

Target Version/s:

Description

After running kill -STOP on the datanode in the middle of a write pipeline, the client takes far longer to recover than it should. The ResponseProcessor times out in the correct interval, but doesn't interrupt the DataStreamer, which appears to not be subject to the same timeout. The client only recovers once the OS actually declares the TCP stream dead, which can take a very long time.

I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

local-dn.log
23/Jan/10 00:48
12 kB
Todd Lipcon
hdfs-915-0.20.txt
14/Mar/12 20:11
2 kB
Todd Lipcon

Issue Links

is related to

HDFS-917 Write pipeline heartbeat interval should be determined by client timeout, not DN

Open

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 23/Jan/10 00:00

Updated:: 16/May/12 19:17