Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Invalid
-
2.2.16, 3.0.20, 3.11.7, 4.0-alpha3, 4.0
-
None
-
Correctness - Unrecoverable Corruption / Loss
-
Critical
-
Challenging
-
Code Inspection
-
All
-
None
Description
Memtable do not contain records that cover a precise contiguous range of ReplayPosition, since there are only weak ordering constraints when rolling over to a new Memtable - the last operations for the old Memtable may obtain their ReplayPosition after the first operations for the new Memtable.
Unfortunately, we treat the Memtable range as contiguous, and invalidate the entire range on flush. Ordinarily we only invalidate records when all prior Memtable have also successfully flushed. However, in the event of a flush that does not terminate the process (either because of disk failure policy, or because it is a software error), the later flush is able to invalidate the region of the commit log that includes records that should have been flushed in the prior Memtable
More problematically, this can also occur on restart without any associated flush failure, as we use commit log boundaries written to our flushed sstables to filter ReplayPosition on recovery, which is meant to replicate our Memtable flush behaviour above. However, we do not know that earlier flushes have completed, and they may complete successfully out-of-order. So any flush that completes before the process terminates, but began after another flush that doesn’t complete before the process terminates, has the potential to cause permanent data loss.