Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
NameNodeProxies.createNNProxyWithClientProtocol does
ClientNamenodeProtocolPB proxy = RPC.getProtocolProxy( ClientNamenodeProtocolPB.class, version, address, ugi, conf, NetUtils.getDefaultSocketFactory(conf), org.apache.hadoop.ipc.Client.getTimeout(conf), defaultPolicy, fallbackToSimpleAuth).getProxy();
which calls Client.getTimeOut(conf) to get timeout value.
Client.getTimeOut(conf) doesn't consider IPC_CLIENT_RPC_TIMEOUT_KEY right now. Thus rpcTimeOut doesn't take effect for relevant RPC calls, and they hang!
For example, receiveRpcResponse blocked forever at:
Thread 16127: (state = BLOCKED) - sun.nio.ch.SocketChannelImpl.readerCleanup() @bci=6, line=279 (Compiled frame) - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=205, line=390 (Compiled frame) - org.apache.hadoop.net.SocketInputStream$Reader.performIO(java.nio.ByteBuffer) @bci=5, line=57 (Compiled frame) - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) @bci=35, line=142 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, line=161 (Compiled frame) - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, line=131 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, int) @bci=4, line=521 (Compiled frame) - java.io.BufferedInputStream.fill() @bci=214, line=246 (Compiled frame) - java.io.BufferedInputStream.read() @bci=12, line=265 (Compiled frame) - java.io.DataInputStream.readInt() @bci=4, line=387 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse() @bci=19, line=1081 (Compiled frame) - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=976 (Compiled frame)
Filing this jira to fix it.
Attachments
Issue Links
- duplicates
-
HADOOP-12672 RPC timeout should not override IPC ping interval
- Resolved
- is duplicated by
-
HADOOP-14198 Should have a way to let PingInputStream to abort
- Resolved