Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15018

Inconsistent way of handling TimeoutException in the rpc client implementations

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0, 1.2.0, 2.0.0
    • 1.2.0, 1.3.0, 1.1.3, 2.0.0
    • Client, IPC/RPC
    • None
    • Incompatible change, Reviewed
    • Hide
      When using the new AsyncRpcClient introduced in HBase 1.1.0 (HBASE-12684), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.
      Show
      When using the new AsyncRpcClient introduced in HBase 1.1.0 ( HBASE-12684 ), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation.

    Description

      If there is any rpc timeout when using RpcClientImpl then we wrap the exception in IOE and throw it,

      2015-11-16 04:05:24,935 WARN [main-EventThread.replicationSource,peer2] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error:
      java.io.IOException: Call to host-XX:16040 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1271)
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1239)
      at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
      at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
      at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:25690)
      at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:77)
      at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:322)
      at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:308)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=510, waitTime=180001, operationTimeout=180000 expired.
      at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
      at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1213)
      ... 10 more
      

      But that isn't case with AsyncRpcClient, we don't wrap and throw CallTimeoutException as it is.

      2015-12-21 14:27:33,093 WARN  [RS_OPEN_REGION-host-XX:16201-0.replicationSource.host-XX%2C16201%2C1450687255593,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error: 
      org.apache.hadoop.hbase.ipc.CallTimeoutException: callId=2, method=ReplicateWALEntry, rpcTimeout=600000, param {TODO: class org.apache.hadoop.hbase.protobuf.generated.AdminProtos$ReplicateWALEntryRequest}
      	at org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:257)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:295)
      	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23707)
      	at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:73)
      	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:387)
      	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:370)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      I think we should have same behavior across both the implementations.

      Attachments

        1. HBASE-15018.patch
          7 kB
          Ashish Singhi
        2. HBASE-15018.patch
          7 kB
          Michael Stack
        3. HBASE-15018-v1.patch
          7 kB
          Ashish Singhi
        4. HBASE-15018-v1(1).patch
          7 kB
          Ashish Singhi
        5. HBASE-15018-v2.patch
          10 kB
          Ashish Singhi
        6. HBASE-15018-v3.patch
          12 kB
          Ashish Singhi
        7. HBASE-15018-v3-branch-1.patch
          12 kB
          Ashish Singhi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ashish singhi Ashish Singhi
            ashish singhi Ashish Singhi
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment