Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
This is a new feature. It is documented in hdfs_user_guide.xml.
Description
When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime.
Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down.
Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts.
I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-2079 1073: create an escape hatch to ignore startup consistency problems
- Resolved
-
HDFS-3055 Implement recovery mode for branch-1
- Closed
-
HDFS-3134 Harden edit log loader against malformed or malicious input
- Closed
- relates to
-
HDFS-3049 During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt
- Closed