[HDFS-3091] Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.23.0, 2.0.0-alpha
Fix Version/s: 2.0.0-alpha
Component/s: datanode, hdfs-client, namenode
Labels:
None

Target Version/s:

0.23.3
Hadoop Flags:

Reviewed

Description

When verifying the ~~HDFS-1606~~ feature, Observed couple of issues.

Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure.

12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)

Lets take some cases:
1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1.

ReplaceDatanodeOnFailure will be satisfied because existings(1)<= replication/2 (3/2==1).

But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail.

This will be resulted to Wite failure.

2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size),
Cluser has only 5 datanodes.

Here even if one node fails also write will fail with same reason.
Because pipeline max will be 5 and killed one datanode, then existings will be 4

existings(4)<= replication/2(10/2==5) will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure.

3) sync realted opreations also fails in this situations ( will post the clear scenarios)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

h3091_20120319.patch
19/Mar/12 18:49
0.8 kB
Tsz-wo Sze

Issue Links

is related to

HDFS-4600 HDFS file append failing in multinode cluster

Resolved

Activity

People

Assignee:: Tsz-wo Sze

Reporter:: Uma Maheswara Rao G

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Mar/12 05:36

Updated:: 28/Sep/15 20:58

Resolved:: 19/Mar/12 20:00