[HBASE-15018] Inconsistent way of handling TimeoutException in the rpc client implementations - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.0, 1.2.0, 2.0.0
Fix Version/s: 1.2.0, 1.3.0, 1.1.3, 2.0.0
Component/s: Client, IPC/RPC
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:

Hide
When using the new AsyncRpcClient introduced in HBase 1.1.0 (~~HBASE-12684~~), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.

Show
When using the new AsyncRpcClient introduced in HBase 1.1.0 ( HBASE-12684 ), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.

Description

If there is any rpc timeout when using RpcClientImpl then we wrap the exception in IOE and throw it,

2015-11-16 04:05:24,935 WARN [main-EventThread.replicationSource,peer2] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error:
java.io.IOException: Call to host-XX:16040 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1271)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1239)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:25690)
at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:77)
at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:322)
at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:308)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1213)
... 10 more

But that isn't case with AsyncRpcClient, we don't wrap and throw CallTimeoutException as it is.

2015-12-21 14:27:33,093 WARN  [RS_OPEN_REGION-host-XX:16201-0.replicationSource.host-XX%2C16201%2C1450687255593,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: callId=2, method=ReplicateWALEntry, rpcTimeout=600000, param {TODO: class org.apache.hadoop.hbase.protobuf.generated.AdminProtos$ReplicateWALEntryRequest}
	at org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:257)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:295)
	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23707)
	at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:73)
	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:387)
	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:370)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I think we should have same behavior across both the implementations.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-15018.patch
23/Dec/15 23:34
7 kB
Michael Stack
HBASE-15018.patch
21/Dec/15 12:52
7 kB
Ashish Singhi
HBASE-15018-v1.patch
24/Dec/15 07:28
7 kB
Ashish Singhi
HBASE-15018-v1(1).patch
24/Dec/15 10:51
7 kB
Ashish Singhi
HBASE-15018-v2.patch
24/Dec/15 17:22
10 kB
Ashish Singhi
HBASE-15018-v3.patch
26/Dec/15 16:52
12 kB
Ashish Singhi
HBASE-15018-v3-branch-1.patch
30/Dec/15 07:06
12 kB
Ashish Singhi

Issue Links

blocks

HBASE-14937 Make rpc call timeout for replication adaptive

Closed

Inconsistent way of handling TimeoutException in the rpc client implementations

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates