Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
QuorumJournalManager (HDFS-3077)
-
None
-
None
-
Reviewed
Description
This is a potential optimization that we can add to the JournalNode: when one of the nodes is lagging behind the others (eg because its local disk is slower or there was a network blip), it receives edits after they've been committed to a majority. It can tell this because the committed txid included in the request info is higher than the highest txid in the actual batch to be written. In this case, we know that this batch has already been fsynced to a quorum of nodes, so we can skip the fsync() on the laggy node, helping it to catch back up.