Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Edit log branch (HDFS-1073)
-
None
-
Reviewed
Description
This JIRA is to address the following scenario/bug:
- The NN is configured with an edits-only storage dir in /edits and an image-only storage dir in /image
- The image dir fails while it is running. Since the edits dir is still valid it does not immediately shut itself down. 2NN continues to try to checkpoint, but fails because it can't upload an image anywhere
- Operator fixes the disk on /image and instructs the NN to restore removed storage
- The 2NN should now be able to download/upload a checkpoint successfully.
Currently this does not work since the NN clears the storage dir upon restoring it. With the 1073 design, out-of-date files aren't a problem, and in fact can be used to restore the namespace.