Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.5.0, 0.6.0, 0.7.0
-
None
Description
tablet_bootstrap has the following TODO:
if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), &last_durable_dms_id)) { // if we have no data about this RowSet, then it must have been flushed and // then deleted. // TODO: how do we avoid a race where we get an update on a rowset before // it is persisted? add docs about the ordering of flush. return true; }
alter_table-randomized-test, when looped in TSAN, seems to fail after around 30 iterations with a sequence like:
- a compaction enters "duplicating" phase
- an update arrives, which is duplicated into the old and new rowsets ids
- the new rowset ID isn't part of the metadata yet
- we get kill -9ed before we flush the metadata from the compaction
It seems that we then mis-identify the update to the "new" store as already flushed, which can cause the bootstrap to fail (or maybe cause a missing update).
Attachments
Issue Links
- is related to
-
KUDU-218 Should exercise case where a duplicated insert was flushed from neither store
- Resolved