Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
Reviewed
Description
We have seen a standby namenode crashing due to edit log corruption. It was complaining that OP_CLOSE cannot be applied because the file is not under-construction.
When a client was trying to append to the file, the remaining space quota was very small. This caused a failure in prepareFileForWrite(), but after the inode was already converted for writing and a lease added. Since these were not undone when the quota violation was detected, the file was left in under-construction with an active lease without edit logging OP_ADD.
A subsequent append() eventually caused a lease recovery after the soft limit period. This resulted in commitBlockSynchronization(), which closed the file with OP_CLOSE being logged. Since there was no corresponding OP_ADD, edit replaying could not apply this.