[HDFS-6574] make sure addToDeadNodes() only be called when there's a connection issue occur - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.5.0, 3.0.0-alpha1
Fix Version/s: None
Component/s: hdfs-client
Labels:
None

Description

My colleague cuijianwei found in a HBase testing scenario, once a bad disk occured, the local read will be skipped and lots of remote reads be requested for a lengthy time, say, tens of minutes, then we had to trigger a compaction to help recovering the locality and read latency.
It turned out relating with the addToDeadNodes(), imaging a disk in local node has something wrong, current impl will add the entity local node to the dead node list, then all other good disks in local node could not get read request any more.
So better choices here to me, seems:
1) tell the detail IOException really is a connection related exception, then call addToDeadNodes(). or
2) tell the IOException is related with bad block/disk, w/o call addToDeadNodes(); else call addToDeadNodes().

another thing need to consider is if we have got a disk exception from one node, should we refresh the locatedBlocks info from nn to clear all rotten caching for that bad disk of the node ? it'll be heavy somehow if it's a huge size file...

We have a plan to make a patch soon for our internal hadoop branch, due to it'll degrade HBase read performance severely once a sick disk ocurred, also we'd like to contribute to community if you think this is not too crazy... stack

Attachments

Activity

People

Assignee:: Liang Xie

Reporter:: Liang Xie

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Jun/14 08:41

Updated:: 12/May/16 18:14