[HADOOP-6889] Make RPC to have an option to timeout - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.22.0
Fix Version/s: 0.20.205.0, 0.22.0, 0.23.0
Component/s: ipc
Labels:
None

Target Version/s:

0.20.205.0, 0.22.0, 0.23.0
Hadoop Flags:

Reviewed

Description

Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.

But Hadoop RPC is also used for some of client to DataNode communications, for example, for getting a replica's length. When a client comes across a problematic DataNode, it gets stuck and can not switch to a different DataNode. In this case, it would be better that the client receives a timeout exception.

I plan to add a new configuration ipc.client.max.pings that specifies the max number of pings that a client could try. If a response can not be received after the specified max number of pings, a SocketTimeoutException is thrown. If this configuration property is not set, a client maintains the current semantics, waiting forever.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ipcTimeout.patch
03/Aug/10 22:30
22 kB
Hairong Kuang
ipcTimeout1.patch
04/Aug/10 00:30
23 kB
Hairong Kuang
ipcTimeout2.patch
05/Aug/10 16:43
23 kB
Hairong Kuang
HADOOP-6889.patch
21/Jul/11 15:58
24 kB
John George
HADOOP-6889-for20.patch
21/Jul/11 15:59
24 kB
John George
HADOOP-6889-for20.2.patch
29/Jul/11 21:56
28 kB
Ravi Prakash
HADOOP-6889-for20.3.patch
02/Aug/11 11:03
28 kB
Matthew Foley
HADOOP-6889-for-20security.patch
17/Aug/11 15:00
12 kB
John George
HADOOP-6889-fortrunk.patch
17/Aug/11 15:00
11 kB
John George
HADOOP-6889-fortrunk-2.patch
26/Aug/11 22:46
11 kB
John George

Issue Links

blocks

HDFS-1330 Make RPCs to DataNodes timeout

Closed

depends upon

HADOOP-6907 Rpc client doesn't use the per-connection conf to figure out server's Kerberos principal

Closed

duplicates

HADOOP-7488 When Namenode network is unplugged, DFSClient operations waits for ever

Resolved

relates to

YARN-2578 NM does not failover timely if RM node network connection fails

Resolved

Activity

People

Assignee:: John George

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 29/Jul/10 20:13

Updated:: 09/Jun/15 12:17

Resolved:: 04/Oct/11 14:43