Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Edit log branch (HDFS-1073)
-
None
-
Reviewed
Description
In fault-testing the HDFS-1073 branch, I saw the following situation:
- NN has two storage directories, but one is in failed state
- NN starts to roll edits logs to edits_inprogress_5160285
- NN then crashes
- on restart, it detects the truncated log, but since it has 0 txns, it finalizes it to the nonsense log name edits_5160285-5160284.
- It then starts logs again at edits_inprogress_5160285.
- After this point, no checkpoints or future NN startups succeed since there are two logs starting with the same txid