Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Degradation - Other Exception
-
Normal
-
Challenging
-
User Report
-
All
-
None
Description
Clusters that contain prepared statements that partially select static columns before the upgrade will fail to execute those statements coordinated from the 4.x nodes until the upgrade completes.
Reproduction
Setup (before upgrade)
CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3} CREATE TABLE ks1.tbl1 (pk1 int, ck2 int, s3 int static, s4 int static, c5 int, PRIMARY KEY (pk1, ck2)); INSERT INTO ks1.tbl1 (pk1, ck2, s3, s4, c5) VALUES (1, 2, 3, 4, 5);
Prepared Statement (prepare before upgrade)
SELECT c5, s3 FROM ks1.tbl1 WHERE pk1 = ? AND ck2 = ?;
Exception on 3.0.x nodes (when executing prepared statement after upgrade)
java.lang.IllegalStateException: [s3, s4] is not a subset of [s3] at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:566) at org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:498) at org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:235) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:209) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:141) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:129) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:191) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:181) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:177) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:335) at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91) at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93) at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44) at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:433) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834)
Exception on 4.0.x nodes (when executing prepared statement after upgrade)
java.lang.IllegalStateException: [ColumnDefinition{name=s3, type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1}, ColumnDefinition{name=s4, type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1}] is not a subset of [s3] at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:555) at org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:487) at org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:216) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:190) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:121) at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:109) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:94) at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:326) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:186) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:179) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:175) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:75) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:499) at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:194) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:137) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:167) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:122) at java.lang.Thread.run(Thread.java:748)
The root cause is CASSANDRA-16686 changes ColumnFilters to build and deserialize based on what versions the coordinating node thinks are running in the cluster, and that
knowledge is always incorrect when statements are reprepared on startup and may be incorrect as all nodes reach their final version.
Sequence of events:
Prepared statements are persisted in system.prepared_statements to be re-prepared on future startup.
When the 4.x node starts up after upgrade, in org.apache.cassandra.service.CassandraDaemon#setup it calls QueryProcessor.instance.preloadPreparedStatements before the Gossiper is started by a call to StorageService.instance.initServer() later in setup.
As part of preparing statements, when possible a ColumnFilterFactory is created that returns a ColumnFilter built at the time the query is prepared.
After the changes from CASSANDRA-16686, the ColumnFilter builder constructs different column filter variants depending on the lowest version reported in gossip by checking org.apache.cassandra.gms.Gossiper#upgradeFromVersionMemoized. If this runs before the Gossiper is enabled the SystemKeyspace.CURRENT_VERSION, causing the ColumnFilter to create a column filter as if the cluster were fully upgraded.
For the query above, the ColumnFilter creates an ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter.
The 3.0.x nodes participating do not understand the new flag and creates a ColumnFilter the equivalent of a WildcardColumnFilter. The 4.x nodes participating do understand the new flag, however the deserializer takes the lower than 3.4 path as other 3.0 nodes are known about and creates a WildcardColumFilter.
The fetchedColumns sent by the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter only contains the queried static columns, however the pre-3.4 sstable iterator returns all regular and static columns, causing an IllegalStateException when the serialized response is sent back.
The ISE clears once all nodes in the cluster think they are upgraded to the current version and behave as the originally prepared query intended.
Related Problems
Non-deterministic behavior of 4.0.x/4.1.x nodes
If the prepared statements are cleared and/or freshly prepared when the cluster is in mixed 3.0/4.0 mode, the pre-built ColumnFilter will remain in the mixed mode version until re-prepared on a restart or cache clear/eviction.
As upgradeFromVersionMemoized times out and is recalculated after the upgrade reaches a single version, individual nodes will make a local decision on column filter building and deserializing.
Nodes that update upgradeFromVersionMemoized early that coordinate requests may cause the same ISE against nodes responding to the read command have the previous version still.
Digest Mismatches
If ALL_REGULARS_AND_QUERIED_STATICS_COLUMN ColumnFilter s are incorrectly sent to 3.0.x nodes, the list of columns included will be ignored and compute a different digest than one locally executed on a 4.0.x coordinator.
Proposed fix
In discussion with ifesdjeen, he suggested that the one way to resolve this is the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter should by deprecated (or just removed) and no longer built, always selecting all static columns
This would just leave WildCardColumnFilter and SelectionColumnFilter with ALL_COLUMNS or ONLY_QUERIED_COLUMNS.
This is a potential performance regression for unusual schemas with very large numbers of static columns, but seems unlikely in practice.
/cc: blerer
Attachments
Issue Links
- is duplicated by
-
CASSANDRA-19751 IllegalStateException when query on table having static columns during the Cassandra cluster upgrade from 3.11.4 to 4.0.11
- Resolved