Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Availability - Response Crash
-
Critical
-
Normal
-
Fuzz Test
-
All
-
None
-
Description
There’s an NPE in Slice#make on RT + partition deletion reconciliation.
Minimal repro:
try (Cluster cluster = init(builder().withNodes(3).start())) { cluster.schemaChange(withKeyspace("CREATE TABLE distributed_test_keyspace.table_0 (pk0 bigint,ck0 bigint,regular0 bigint,regular1 bigint,regular2 bigint, PRIMARY KEY (pk0, ck0)) WITH CLUSTERING ORDER BY (ck0 ASC);")); long pk = 0L; cluster.coordinator(1).execute("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 100230 WHERE pk0=? AND ck0>?;", ConsistencyLevel.ALL, pk, 2L); cluster.get(3).executeInternal("DELETE FROM distributed_test_keyspace.table_0 USING TIMESTAMP 100230 WHERE pk0=?;", pk); cluster.coordinator(1).execute("SELECT * FROM distributed_test_keyspace.table_0 WHERE pk0=? AND ck0>=? AND ck0<?;", ConsistencyLevel.ALL, pk, 1L, 3L); }
Details:
java.lang.AssertionError: Error merging RTs on distributed_test_keyspace.table_0: merged=null, versions=[Marker EXCL_START_BOUND(2)@100230/1613500432, Marker EXCL_START_BOUND(2)@100230/1613500432, null], sources={[Full(/127.0.0.1:7012,(-3074457345618258603,3074457345618258601]), Full(/127.0.0.2:7012,(-3074457345618258603,3074457345618258601]), Full(/127.0.0.3:7012,(-3074457345618258603,3074457345618258601])]}, debug info: /127.0.0.1:7012 => [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 columns=[[] | [regular0 regular1 regular2]] repaired_digest= repaired_digest_conclusive==true Marker EXCL_START_BOUND(2)@100230/1613500432 Marker EXCL_END_BOUND(3)@100230/1613500432, /127.0.0.2:7012 => [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 columns=[[] | [regular0 regular1 regular2]] repaired_digest= repaired_digest_conclusive==true Marker EXCL_START_BOUND(2)@100230/1613500432 Marker EXCL_END_BOUND(3)@100230/1613500432, /127.0.0.3:7012 => [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=100230, localDeletion=1613500432 columns=[[] | [regular0 regular1 regular2]] repaired_digest= repaired_digest_conclusive==true
Exception:
java.lang.NullPointerException at org.apache.cassandra.db.Slice.make(Slice.java:74) at org.apache.cassandra.service.reads.repair.RowIteratorMergeListener.closeOpenMarker(RowIteratorMergeListener.java:351) at org.apache.cassandra.service.reads.repair.RowIteratorMergeListener.onMergedRangeTombstoneMarkers(RowIteratorMergeListener.java:315) at org.apache.cassandra.service.reads.DataResolver$2$1.onMergedRangeTombstoneMarkers(DataResolver.java:378) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:592) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:541) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:219) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:523) at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:391) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) at org.apache.cassandra.db.transform.FilteredRows.isEmpty(FilteredRows.java:50) at org.apache.cassandra.db.transform.EmptyPartitionsDiscarder.applyToPartition(EmptyPartitionsDiscarder.java:27) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97) at org.apache.cassandra.service.StorageProxy$6.hasNext(StorageProxy.java:1908) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:93) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:777) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:425) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:402) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250) at org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithPagingWithResult$2(Coordinator.java:162) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
This behaviour is new to 4.0 and was introduced in CASSANDRA-15369. The difference with 3.0 is that in 3.0 RangeTombstoneMarker merged would be null, so we would never be hitting the code path where some of the sources is opening/closing marker, and instead will fall through to opening/closing of deletions below. I've checked code in 15369 and it looks like this condition is a new edge case in otherwise correct code. Since the goal was to avoid generating range tombstones on ties with partition deletion, fix for this issue is also consistent with that goal.
In other words, on 3.0 given
cluster.coordinator(1).execute("INSERT INTO distributed_test_keyspace.tbl0 (pk, ck, value) VALUES (?,?,?) USING TIMESTAMP 1", ConsistencyLevel.ALL, pk, 1L, 1L, 1L); cluster.coordinator(1).execute("DELETE FROM distributed_test_keyspace.tbl0 USING TIMESTAMP 2 WHERE pk=? AND ck>?;", ConsistencyLevel.ALL, pk, 2L);
We would RR:
Mutation(keyspace='distributed_test_keyspace', key='0000000000000000', modifications=[ [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 columns=[[] | [regular0 regular1 regular2]] Marker EXCL_START_BOUND(2)@100230/1615295010 Marker EXCL_END_BOUND(3)@100230/1615295010 ]) mutation = Mutation(keyspace='distributed_test_keyspace', key='0000000000000000', modifications=[ [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=100230, localDeletion=1615295010 columns=[[] | [regular0 regular1 regular2]] Marker EXCL_START_BOUND(2)@100230/1615295010 Marker EXCL_END_BOUND(3)@100230/1615295010 ]) Mutation(keyspace='distributed_test_keyspace', key='0000000000000000', modifications=[ [distributed_test_keyspace.table_0] key=0 partition_deletion=deletedAt=100230, localDeletion=1615295010 columns=[[] | [regular0 regular1 regular2]] ])
And on 4.0:
Mutation(keyspace='distributed_test_keyspace', key='0000000000000000', modifications=[ [distributed_test_keyspace.tbl0] key=0 partition_deletion=deletedAt=2, localDeletion=1615295072 columns=[[] | [value]] ]) Mutation(keyspace='distributed_test_keyspace', key='0000000000000000', modifications=[ [distributed_test_keyspace.tbl0] key=0 partition_deletion=deletedAt=2, localDeletion=1615295072 columns=[[] | [value]] ])