Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
M4
-
None
-
None
Description
Per extensive discussion on IRC this week, we determined the following:
- currently we are waiting on the local peer to durably log a transaction's COMMIT record before we release locks and respond to the client
- however, from the consensus point of view, this is unnecessary, by the intuition that any action that occurs only on a minority of nodes cannot be considered persistent
Right now, we're relying on it for a separate reason: if we were to release the locks for the applied mutations in memory, then we could get the following interleaving:
1. log REPLICATE for a write
2. apply the write to memory
3. flush the memory to disk
4. log the COMMIT to disk
If we were to crash between step 3 and 4, then the recovery code would try to replay the edit, not realizing that the edit was already made durable by virtue of the flush in step 3.
The solution is to add a step to the flush/compact code which does a "soft barrier" of sorts - wait to perform the flush until all of the transactions pertaining to data in that memory region have been COMMITted in the WAL.