Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
None
-
Normal
Description
When an insert of an indexed column is followed rapidly (within the same memtable) by a delete of an entire partition, the index table for the column will continue to store the record for the inserted value and no tombstone will ever be written. This occurs because the index isn't updated after the delete but before the flush. The value is lost after flush, so subsequent compactions can't issue a delete for the primary key in the index column.
The attached test reproduces the described issue. The test fails to assert that the index cfs is empty. The subsequent assertion that there are no live sstables would also fail. Looking on disk with sstabledump after running this test shows the value remaining.
Originally reported on the mailing list by Roman Bielik:
Create table with LeveledCompactionStrategy;
'tombstone_compaction_interval': 60; gc_grace_seconds=60
There are two indexed columns for comparison: column1, column2
Insert keys {1..x} with random values in column1 & column2
Delete {key:column2} (but not column1)
Delete {key}
Repeat n-times from the inserts
Wait 1 minute
nodetool flush
nodetool compact (sometimes compact <keyspace> <table.index>
nodetool cfstats
What I observe is, that the data table is empty, column2 index table is
also empty and column1 index table has non-zero (leaked) "space used" and
"estimated rows".