[HBASE-10506] Fail-fast if client connection is lost before the real call be executed in RPC layer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.94.3
Fix Version/s: 0.98.0, 0.96.2, 0.99.0, 0.94.17
Component/s: IPC/RPC
Labels:
None

Hadoop Flags:

Reviewed

Description

In current HBase rpc impletement, there is no any connection double-checking just before the "call" be invoked, considing there's a gc or other OS scheduling or the call queue is full enough(e.g. the server side is slow/hang due to some issues), and if the client side has a small rpc timeout value, it could be possible when this request be taken from call queue, the client connection is lost in that moment. we'd better has some fail-fast code before the reall "call" be invoked, it just waste the server side resource.
Here is a strace trace from our production env:
2014-02-11,18:16:19,525 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3eae6c77, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

{"X":["T"]}

,"maxVersions":1,"row":"074103000000001-m8997060"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43252: output error
2014-02-11,18:16:19,526 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 151 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
2014-02-11,18:16:19,797 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

{"X":["T"]}

,"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259 after 0 ms, since caller disconnected
at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:450)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3633)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4414)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4387)
at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2075)
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)
2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@3f10ffd2, {"timeRange":[0,9223372036854775807],"totalColumns":1,"cacheBlocks":true,"families":

{"X":["T"]}

,"maxVersions":1,"row":"4245978-m7281526"}), rpc version=1, client version=29, methodsFingerPrint=-241105381 from 10.101.10.181:43259: output error
2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 46 on 12600 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null

With this fix, we can reduce this hit probability at least the upstream hadoop has this checking already, see: https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2034-L2036

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-10506-trunk.txt
12/Feb/14 04:10
0.7 kB
Liang Xie
HBASE-10506-0.94.txt
12/Feb/14 03:51
0.8 kB
Liang Xie

Activity

People

Assignee:: Liang Xie

Reporter:: Liang Xie

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 12/Feb/14 03:46

Updated:: 06/Dec/14 11:10

Resolved:: 13/Feb/14 03:47