Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.20.1
-
None
-
None
Description
After running kill -STOP on the datanode in the middle of a write pipeline, the client takes far longer to recover than it should. The ResponseProcessor times out in the correct interval, but doesn't interrupt the DataStreamer, which appears to not be subject to the same timeout. The client only recovers once the OS actually declares the TCP stream dead, which can take a very long time.
I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-917 Write pipeline heartbeat interval should be determined by client timeout, not DN
- Open