Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
HA branch (HDFS-1623)
-
None
-
Reviewed
Description
Currently, the NN logs its edits to each of its edits directories in sequence. This can produce the following bad sequence:
- NN accumulates 100 edits (tx 1-100) in the buffer. Writes and syncs to local drive, then crashes
- Failover occurs. SBN takes over at txid=1, since txid 1 never got writen.
- First NN restarts. It reads up to txid 100 from its local directories. It is now "ahead" of the active NN with inconsistent state.
The solution is to write to the shared edits dir, and sync that, before writing to any local drives.
Attachments
Attachments
Issue Links
- relates to
-
HDFS-2769 HA: When HA is enabled with a shared edits dir, that dir should be marked required
- Resolved