Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.0, 2.6.0, 3.0.2
-
None
-
Reviewed
-
When dead node blocks DFSInputStream, Deadnode detection can find it and share this information to other DFSInputStreams in the same DFSClient. Thus, these DFSInputStreams will not read from the dead node and be blocked by this dead node.
Description
Currently, the information of the dead datanode in DFSInputStream in stored locally. So, it could not be shared among the inputstreams of the same DFSClient. In our production env, every days, some datanodes dies with different causes. At this time, after the first inputstream blocked and detect this, it could share this information to others in the same DFSClient, thus, the ohter inputstreams are still blocked by the dead node for some time, which could cause bad service latency.
To eliminate this impact from dead datanode, we designed a dead datanode detector, which detect the dead ones in advance, and share this information among all the inputstreams in the same client. This improvement has being online for some months and works fine. So, we decide to port to the 3.0 (the version used in our production env is 2.4 and 2.6).
I will do the porting work and upload the code later.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-15661 The DeadNodeDetector shouldn't be shared by different DFSClients.
- Resolved
-
HDFS-15809 DeadNodeDetector doesn't remove live nodes from dead node set.
- Resolved
-
HDFS-15264 Backport Datanode detection to branch-2.10
- Resolved
- relates to
-
HDFS-15149 TestDeadNodeDetection test cases time-out
- Resolved
1.
|
Implement DeadNodeDetector basic model | Resolved | Lisheng Sun | |||||||||
2.
|
DeadNodeDetector checks dead node periodically | Resolved | Lisheng Sun | |||||||||
3.
|
Add suspect probe for DeadNodeDetector | Resolved | Lisheng Sun | |||||||||
4.
|
DeadNodeDetector redetects Suspicious Node | Resolved | Lisheng Sun | |||||||||
5.
|
Refactor the unit test of TestDeadNodeDetection | Resolved | Lisheng Sun | |||||||||
6.
|
Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes() | Resolved | Lisheng Sun | |||||||||
7.
|
Tiny Improve for DeadNode detector | Resolved | imbajin |
|
||||||||
8.
|
Let DeadNode Detector also work for EC cases | Open | imbajin |
|