Description
When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.
The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.
See also HADOOP-4885.
Attachments
Issue Links
- is related to
-
HDFS-3127 failure in recovering removed storage directories should not stop checkpoint process
- Closed
- relates to
-
HADOOP-4885 Try to restore failed replicas of Name Node storage (at checkpoint time)
- Closed