While we have always imposed that the type of any column name remains consistent, we have not done the same for its kind. If a column’s kind is changed via Drop/Add, the SerializationHeader’s column sets will be interpreted incorrectly, and may either lead to CorruptSSTableException or silent data corruption.
The problem occurs in SerializationHeader.Component.toHeader(). In this method, we lookup columns from the metadata only by name, ignoring the static status. If a column of the same name but different kind to our dropped column exists in the schema, we will only add this column to the list of fields we expect. So we will be missing a column from our set of expected static columns. We will also add the regular variant to the set of regular columns present, which it may not be.
There are a number of ways this can play out:
1) There are no other static columns in the affected sstables. In this case we will incorrectly treat the table as containing no static data, so we will not attempt to read any static row. This leaves a static row to be read as the first regular row, and will throw the exception in the description.
1a) If for some reason the static row is absent - say, it has expired - then depending on lexicographical ordering, and if the sstable did not contain any instance of the regular variant, we may misattribute column data. This will probably result in corruption exceptions, but if the columns were of the same type it could be undetectable.
2) There are other static columns on the table, and in the affected sstables. In this case, the row will be correctly treated as a static row, but depending on the dropped column’s lexicographical position relative to these, it may result in column data being misattributed to other columns. This could be undetectable corruption, if the column types were the same.
3) Either of these above scenario could be replayed with the static/regular relations replaced.
CASSANDRA-14591 would protect against 1a and 2.
Probably the safest and correct fix is to prevent adding a column back with a different kind to the original, however this would require storing more metadata about dropped columns. This is probably viable for 4.0, but for 3.0 perhaps too invasive.
It is quite easy to make this particular method safe against this particular corruption. However it is hard to be certain there aren’t other places where assumptions were made about a name being of only one kind. There are only a handful of places that should need to be of concern, specifically those places that invoke CFMetaData.getDroppedColumn, so we should be fairly confident that - if these call sites are safe - we are now insulated. This might suffice for 3.0. Thoughts?