[HDFS-3075] Backport HADOOP-4885 to branch-1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.2
Component/s: namenode
Labels:
None

Target Version/s:

1.0.2
Hadoop Flags:

Reviewed

Description

When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.

The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

Attachments

Issue Links

is related to

HDFS-3127 failure in recovering removed storage directories should not stop checkpoint process

Closed

relates to

HADOOP-4885 Try to restore failed replicas of Name Node storage (at checkpoint time)

Closed

Activity

People

Assignee:: Brandon Li

Reporter:: Brandon Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Mar/12 22:17

Updated:: 05/Apr/12 18:42

Resolved:: 14/Mar/12 18:48