HBase
  1. HBase
  2. HBASE-10506

Fail-fast if client connection is lost before the real call be executed in RPC layer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.3
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0, 0.94.17
    • Component/s: IPC/RPC
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In current HBase rpc impletement, there is no any connection double-checking just before the "call" be invoked, considing there's a gc or other OS scheduling or the call queue is full enough(e.g. the server side is slow/hang due to some issues), and if the client side has a small rpc timeout value, it could be possible when this request be taken from call queue, the client connection is lost in that moment. we'd better has some fail-fast code before the reall "call" be invoked, it just waste the server side resource.
      Here is a strace trace from our production env:
      2014-02-11,18:16:19,525 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3eae6c77, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

      {"X":["T"]}

      ,"maxVersions":1,"row":"074103000000001-m8997060"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43252: output error
      2014-02-11,18:16:19,526 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 151 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
      2014-02-11,18:16:19,797 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
      org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

      {"X":["T"]}

      ,"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259 after 0 ms, since caller disconnected
      at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:450)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3633)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590)
      at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615)
      at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4414)
      at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4387)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2075)
      at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)
      2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

      {"X":["T"]}

      ,"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259: output error
      2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 46 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null

      With this fix, we can reduce this hit probability at least the upstream hadoop has this checking already, see: https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2034-L2036

      1. HBASE-10506-trunk.txt
        0.7 kB
        Liang Xie
      2. HBASE-10506-0.94.txt
        0.8 kB
        Liang Xie

        Activity

        Liang Xie created issue -
        Liang Xie made changes -
        Field Original Value New Value
        Summary Fail-fast if client connection is lost before the real call be execused in RPC layer Fail-fast if client connection is lost before the real call be executed in RPC layer
        Hide
        Liang Xie added a comment -

        Attached a patch against 0.94 branch, will make a trunk patch shortly

        Show
        Liang Xie added a comment - Attached a patch against 0.94 branch, will make a trunk patch shortly
        Liang Xie made changes -
        Attachment HBASE-10506-0.94.txt [ 12628425 ]
        Liang Xie made changes -
        Attachment HBASE-10506-trunk.txt [ 12628426 ]
        Liang Xie made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Ted Yu added a comment -

        +1

        Show
        Ted Yu added a comment - +1
        Hide
        Lars Hofhansl added a comment -

        Looks good to me. +1

        Show
        Lars Hofhansl added a comment - Looks good to me. +1
        Lars Hofhansl made changes -
        Fix Version/s 0.98.0 [ 12323143 ]
        Fix Version/s 0.96.2 [ 12325658 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Fix Version/s 0.94.17 [ 12325845 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12628426/HBASE-10506-trunk.txt
        against trunk revision .
        ATTACHMENT ID: 12628426

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628426/HBASE-10506-trunk.txt against trunk revision . ATTACHMENT ID: 12628426 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8667//console This message is automatically generated.
        Hide
        Nicolas Liochon added a comment -

        I think it would be better to log at the debug level, with a nice if LOG.isDebugEnabled, and to say something like "client is disconnected, skipping " + call.

        Show
        Nicolas Liochon added a comment - I think it would be better to log at the debug level, with a nice if LOG.isDebugEnabled, and to say something like "client is disconnected, skipping " + call.
        Hide
        Liang Xie added a comment -

        Will commit to trunk, 0.98, 0.96, 0.94 shortly within Nicolas's last comment. Thanks all for reviews!

        Show
        Liang Xie added a comment - Will commit to trunk, 0.98, 0.96, 0.94 shortly within Nicolas's last comment. Thanks all for reviews!
        Liang Xie made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94-security #411 (See https://builds.apache.org/job/HBase-0.94-security/411/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #411 (See https://builds.apache.org/job/HBase-0.94-security/411/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-JDK7 #50 (See https://builds.apache.org/job/HBase-0.94-JDK7/50/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #50 (See https://builds.apache.org/job/HBase-0.94-JDK7/50/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #154 (See https://builds.apache.org/job/HBase-0.98/154/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567842)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #154 (See https://builds.apache.org/job/HBase-0.98/154/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567842) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #22 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/22/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #22 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/22/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94 #1288 (See https://builds.apache.org/job/HBase-0.94/1288/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1288 (See https://builds.apache.org/job/HBase-0.94/1288/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567845) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #142 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/142/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567842)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #142 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/142/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567842) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.96 #294 (See https://builds.apache.org/job/hbase-0.96/294/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567844)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #294 (See https://builds.apache.org/job/hbase-0.96/294/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567844) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4916 (See https://builds.apache.org/job/HBase-TRUNK/4916/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567841)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4916 (See https://builds.apache.org/job/HBase-TRUNK/4916/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567841) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.96-hadoop2 #203 (See https://builds.apache.org/job/hbase-0.96-hadoop2/203/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567844)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.96-hadoop2 #203 (See https://builds.apache.org/job/hbase-0.96-hadoop2/203/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567844) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #89 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/89/)
        HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567841)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #89 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/89/ ) HBASE-10506 Fail-fast if client connection is lost before the real call be executed in RPC layer (liangxie: rev 1567841) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        hongyu bi added a comment -

        fail-fast doesn't decrease the callQueueSize which may lead to "Call queue is full"

        Show
        hongyu bi added a comment - fail-fast doesn't decrease the callQueueSize which may lead to "Call queue is full"
        Hide
        stack added a comment -

        hongyu bi Want to open a new JIRA?

        Show
        stack added a comment - hongyu bi Want to open a new JIRA?
        Hide
        hongyu bi added a comment -

        Sorry, just found HBASE-11705,I will open a new JIRA HBASE-12649 to backport it to 0.94

        Show
        hongyu bi added a comment - Sorry, just found HBASE-11705 ,I will open a new JIRA HBASE-12649 to backport it to 0.94
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        25m 3s 1 Liang Xie 12/Feb/14 04:11
        Patch Available Patch Available Resolved Resolved
        23h 36m 1 Liang Xie 13/Feb/14 03:47
        Resolved Resolved Closed Closed
        13d 57m 1 Lars Hofhansl 26/Feb/14 04:45

          People

          • Assignee:
            Liang Xie
            Reporter:
            Liang Xie
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development