Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Correctness
-
Critical
-
Challenging
-
Workload Replay
-
All
-
None
-
Description
While testing some new machinery for Harry, I have encountered a new RT closer / SSTable Corruption issue. I have grounds to believe this was introduced during the last year.
Issue seems to happen because of intricate interleaving of flushes with writes and deletes.
ERROR [ReadStage-2] 2023-10-16 18:47:06,696 JVMStabilityInspector.java:76 - Exception in thread Thread[ReadStage-2,5,SharedPool] org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: RandomAccessReader:BufferManagingRebufferer.Aligned:CompressedChunkReader.Mmap(/Users/ifesdjeen/foss/java/apache-cassandra-4.0/data/data1/harry/table_1-07c35a606c0a11eeae7a4f6ca489eb0c/nc-5-big-Data.db - LZ4Compressor, chunk length 16384, data length 232569) at org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:381) at org.apache.cassandra.io.sstable.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:242) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:376) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:402) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:151) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:101) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:86) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:343) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:346) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2186) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2581) at org.apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Suppressed: java.lang.IllegalStateException: PROCESSED UnfilteredRowIterator for harry.table_1 (key: ZinzDdUuABgDknItABgDknItABgDknItXEFrgBnOmPmPylWrwXHqjBHgeQrGfnZd1124124583:ZinzDdUuABgDknItABgDknItABgDknItABgDknItABgDknItzHqchghqCXLhVYKM22215251:3.2758E-41 omdt: [deletedAt=564416, localDeletion=1697450085]) has an illegal RT bounds sequence: expected all RTs to be closed, but the last one is open at org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:117) at org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:112) at org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91) at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:341) ... 10 common frames omitted Caused by: java.io.IOException: Invalid Columns subset bytes; too many bits set:10 at org.apache.cassandra.db.Columns$Serializer.deserializeSubset(Columns.java:578) at org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:604) at org.apache.cassandra.db.UnfilteredDeserializer.readNext(UnfilteredDeserializer.java:143) at org.apache.cassandra.io.sstable.format.big.SSTableIterator$ForwardIndexedReader.computeNext(SSTableIterator.java:175) at org.apache.cassandra.io.sstable.AbstractSSTableIterator$ForwardReader.hasNextInternal(AbstractSSTableIterator.java:533) at org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:368) ... 31 common frames omitted
Unfortunately, harry branch is not ready for release yet. That said, I have a snapshot pinned for quick repro and can share SSTables that will easily repro the issue if there are any takers.
To reproduce:
paging 1; # make sure to set page size to 1 (it breaks with other page sizes, too) select * from harry.table_1;
Please also make sure to modify snitch to set rack/dc:
+ public static final String DATA_CENTER_NAME = "datacenter0"; + public static final String RACK_NAME = "rack0";
And set directories in cassandra.yaml:
+data_file_directories: + - /your/path/data/data0 + - /your/path/data/data1 + -/your/path/data/data2
SHA for repro: 865d7c30e4755e74c4e4d26205a7aed4cfb55710
Attachments
Attachments
Issue Links
- Blocked
-
CASSANDRA-18993 Harry-found silent data loss issue
- Resolved
- is related to
-
CASSANDRA-18915 CorruptSSTableException due to invalid Columns subset bytes; too many bits set
- Open
-
CASSANDRA-17056 CEP-17 – Pluggable SSTable format (SSTable format API)
- Resolved
-
CASSANDRA-14591 Throw exception if Columns serialized subset encode more columns than possible
- Resolved
- links to