[HDFS-1849] Respect failed.volumes.tolerated on startup - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.23.0
Component/s: datanode
Labels:
None

Description

The current failed.volumes.tolerated behavior is not user friendly, datanodes can be configured to tolerate N volume failures and still offer service, but if the cluster is restarted all the datanodes with failed volumes will not start unless the failed volumes have been removed from the hdfs configuration files on the respective hosts.

The failed.volumes.tolerated configuration option should be respected on startup. The datanode should only refuse to startup if more than failed.volumes.tolerated (~~HDFS-1161~~) have failed, or if a configured critical volume (HDFS-1848) has failed (which is probably not an issue in practice since dn startup probably fails eg if the root volume has gone readonly).

Attachments

Issue Links

duplicates

HDFS-1592 Datanode startup doesn't honor volumes.tolerated

Closed

is part of

HDFS-2137 Datanode Disk Fail Inplace

Resolved

is related to

HDFS-1158 HDFS-457 increases the chances of losing blocks

Resolved

HDFS-1847 Datanodes should decomission themselves on volume failure

Open

relates to

HDFS-1848 Datanodes should shutdown when a critical volume fails

Open

Activity

People

Assignee:: Unassigned

Reporter:: Eli Collins

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Apr/11 00:05

Updated:: 15/Nov/11 00:53

Resolved:: 21/Apr/11 16:35