-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 2.7.0, 2.6.1, 3.0.0-alpha1
-
Component/s: namenode
-
Labels:
-
Target Version/s:
-
Hadoop Flags:Reviewed
We have seen a standby namenode crashing due to edit log corruption. It was complaining that OP_CLOSE cannot be applied because the file is not under-construction.
When a client was trying to append to the file, the remaining space quota was very small. This caused a failure in prepareFileForWrite(), but after the inode was already converted for writing and a lease added. Since these were not undone when the quota violation was detected, the file was left in under-construction with an active lease without edit logging OP_ADD.
A subsequent append() eventually caused a lease recovery after the soft limit period. This resulted in commitBlockSynchronization(), which closed the file with OP_CLOSE being logged. Since there was no corresponding OP_ADD, edit replaying could not apply this.