[HADOOP-3124] DFS data node should not use hard coded 10 minutes as write timeout. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.17.0
Fix Version/s: 0.17.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:
Makes DataNode socket write timeout configurable. User impact : none.

Description

This problem happens in 0.17 trunk

I saw reducers waited 10 minutes for writing data to dfs and got timeout.
The client retries again and timeouted after another 19 minutes.

After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
I thing we have three issues:

1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
2. The timeout value should not be hard coded.
3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-3124.patch
14/Apr/08 22:25
9 kB
Raghu Angadi
HADOOP-3124.patch
08/Apr/08 03:21
7 kB
Raghu Angadi

Issue Links

incorporates

HADOOP-3051 DataXceiver: java.io.IOException: Too many open files

Closed

relates to

HADOOP-3132 DFS writes stuck occationally

Closed

Activity

People

Assignee:: Raghu Angadi

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 28/Mar/08 20:27

Updated:: 08/Jul/09 16:43

Resolved:: 16/Apr/08 22:03