Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Incomplete
-
0.99.0, 0.94.20
-
None
-
None
Description
we had introduced a fail fast mechanism in RPC layer, see HBASE-10506.
But still, a slow latency problem comes from lower HDFS layer could make all HBase handler threads slow, sometimes those handler could hang several seconds. it's meanlingless to continue processing those read/write requests in the valuable rpc handler threads, especially the ones need to do a costly physical read operation or networking activity(write pipeline). A better solution should be similar with twitter MySQL branch: statement-timeout feature. I haven't taken time on figure out weather it's need to break compatibility or not in master, we are using a 0.94 branch, to me it will break if i adding a operation timeout field in every client rpc request.
so i added a simpler patch using the existing "rpcCall.throwExceptionIfCallerDisconnected()", instrumenting it just before HLog sync and DFSInputStream read/pread (@readAtOffset).