Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15018

Inconsistent way of handling TimeoutException in the rpc client implementations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0, 1.2.0, 2.0.0
    • 1.2.0, 1.3.0, 1.1.3, 2.0.0
    • Client, IPC/RPC
    • None
    • Incompatible change, Reviewed
    • Hide
      When using the new AsyncRpcClient introduced in HBase 1.1.0 (HBASE-12684), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.
      Show
      When using the new AsyncRpcClient introduced in HBase 1.1.0 ( HBASE-12684 ), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.

    Description

      If there is any rpc timeout when using RpcClientImpl then we wrap the exception in IOE and throw it,

      2015-11-16 04:05:24,935 WARN [main-EventThread.replicationSource,peer2] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error:
      java.io.IOException: Call to host-XX:16040 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1271)
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1239)
      at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
      at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
      at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:25690)
      at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:77)
      at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:322)
      at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:308)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
      at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1213)
      ... 10 more
      

      But that isn't case with AsyncRpcClient, we don't wrap and throw CallTimeoutException as it is.

      2015-12-21 14:27:33,093 WARN  [RS_OPEN_REGION-host-XX:16201-0.replicationSource.host-XX%2C16201%2C1450687255593,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error: 
      org.apache.hadoop.hbase.ipc.CallTimeoutException: callId=2, method=ReplicateWALEntry, rpcTimeout=600000, param {TODO: class org.apache.hadoop.hbase.protobuf.generated.AdminProtos$ReplicateWALEntryRequest}
      	at org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:257)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:295)
      	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23707)
      	at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:73)
      	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:387)
      	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:370)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      I think we should have same behavior across both the implementations.

      Attachments

        1. HBASE-15018.patch
          7 kB
          Michael Stack
        2. HBASE-15018.patch
          7 kB
          Ashish Singhi
        3. HBASE-15018-v1.patch
          7 kB
          Ashish Singhi
        4. HBASE-15018-v1(1).patch
          7 kB
          Ashish Singhi
        5. HBASE-15018-v2.patch
          10 kB
          Ashish Singhi
        6. HBASE-15018-v3.patch
          12 kB
          Ashish Singhi
        7. HBASE-15018-v3-branch-1.patch
          12 kB
          Ashish Singhi

        Issue Links

          Activity

            People

              ashish singhi Ashish Singhi
              ashish singhi Ashish Singhi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: