Bug Category:Correctness - Unrecoverable Corruption / Loss
Source Control Link:
OSS C* 3.0 writes incorrect type information for UDTs into the serialization-header of each sstable.
In C* 3.0, both UDTs and tuple are always frozen. A frozen type must be enclosed in a frozen<...> via the CQL3Type hierarchy (resp org.apache.cassandra.db.marshal.FrozenType(...) via the AbstractType hierarchy) “bracket” in the schema and serialization-header.
CASSANDRA-7423 (committed to C* 3.6) UDTs can also be non-frozen (= multi-cell).
Unfortunately, C* 3.0 does not write the org.apache.cassandra.db.marshal.FrozenType(...) “bracket” for UDTs into the SerializationHeader.Component in the -Stats.db sstable component.
The order in which columns of a row are serialized depends on the concrete AbstractType. Columns with variable length types (frozen types belong to this category) are serialized before columns with multi-cell types (non-frozen types belong to that category).
If C* 3.6 (or any newer version) reads an sstable written by C* 3.0 (up to 3.5), it will read the type information “non-frozen UDT” from the serialization header, which is technically correct.
This means, that upgrades from C* 3.0 to C* 3.11 and 4.0, using a schema that uses UDTs, result in inaccessible data in those sstables. Reads against 3.0 sstables as well as attempts to scrub these sstables result in a wide variety of errors/exceptions (CorruptSSTableException, EOFExcepiton, OutOfMemoryError, etc etc), as usual in such cases.
Mitigation strategy in the proposed patch:
- Fix the broken serialization-headers automatically when an upgrade from C* 3.0 is detected.
- Enhance sstablescrub to verify the serialization-header against the schema and allow sstablescrub to fix the UDT types according to the information in the schema. This does not apply to "online scrub" (e.g. nodetool scrub). The behavior of sstablescrub has been changed to first inspect the serialization-header and verify the type information against the schema.
Differences between the schema and the sstable serialization-headers cause sstablescrub to error out and stop - i.e. safety first (there’s a way to opt-out though).
A new class SSTableHeaderFix can inspect the serialization-header (SerializationHeader.Component) in the the -Statistics.db component and fix the type information in those sstables for UDTs according to the schema information.
This new class could be used during verify and before sstables are imported. But changes to “verify” and “import” are out of the scope of this ticket, as the patch is already bigger than I originally expected.
Another issue not tackled by this ticket is that the wrong ‘kind’ is written to the type information in system_schema.dropped_columns when a non-frozen UDT column is dropped. When a UDT column is dropped, the type of the dropped column is converted from the UDT definition to its “corresponding” tuple type definition. But all versions currently write frozen<tuple<...>>, but for non-frozen UDTs it should actually just be tuple<...>. Unfortunately, there is nothing that could be done in this ticket to fix (or even consider) the type information of a dropped column. But for correctness, the tuple type should be a multi-cell one (only accessible for dropped UDTs though - not as something that a user can create as a type).