Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.4.0, 3.0.0-alpha1
-
None
-
None
Description
Currently, if a write or remote read requested into a sick disk, DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to return back. but it doesn't work on local read. Take HBase scan for example,
DFSInputStream.read -> readWithStrategy -> readBuffer -> BlockReaderLocal.read -> dataIn.read -> FileChannelImpl.read
if it hits a bad disk, the low read io probably takes tens of seconds, and what's worse is, the "DFSInputStream.read" hold a lock always.
Per my knowledge, there's no good mechanism to cancel a running read io(Please correct me if it's wrong), so my opinion is adding a future around the read request, and we could set a timeout there, if the threshold reached, we can add the local node into deadnode probably...
Any thought?