Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-17601

IllegalStateException with prepared queries selecting static columns in mixed 3.0.x/4.x clusters

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Degradation - Other Exception
    • Normal
    • Challenging
    • User Report
    • All
    • None

    Description

      Clusters that contain prepared statements that partially select static columns before the upgrade will fail to execute those statements coordinated from the 4.x nodes until the upgrade completes.

      Reproduction

      Setup (before upgrade)

      CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3}
      CREATE TABLE ks1.tbl1 (pk1 int,
      ck2 int,
      s3 int static,
      s4 int static,
      c5 int,
      PRIMARY KEY (pk1, ck2));
      INSERT INTO ks1.tbl1 (pk1, ck2, s3, s4, c5) VALUES (1, 2, 3, 4, 5);
      

      Prepared Statement (prepare before upgrade)

      SELECT c5, s3 FROM ks1.tbl1 WHERE pk1 = ? AND ck2 = ?;
      

      Exception on 3.0.x nodes (when executing prepared statement after upgrade)

      java.lang.IllegalStateException: [s3, s4] is not a subset of [s3] at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:566)
      at org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:498) at org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:235)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:209)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:141)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:129)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80)
      at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:191)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:181)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:177)
      at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
      at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:335)
      at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
      at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
      at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
      at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
      at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:433)
      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
      at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.base/java.lang.Thread.run(Thread.java:834)
      

      Exception on 4.0.x nodes (when executing prepared statement after upgrade)

      java.lang.IllegalStateException: [ColumnDefinition{name=s3, type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1},
      ColumnDefinition{name=s4, type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1}] is not a subset of [s3]
      at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:555)
      at org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:487)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:216)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:190)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:121)
      at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:109)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:94)
      at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
      at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:326)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:186)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:179)
      at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:175)
      at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:75)
      at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:499)
      at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
      at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:194)
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:137)
      at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:167)
      at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:122) at java.lang.Thread.run(Thread.java:748)
      

      The root cause is CASSANDRA-16686 changes ColumnFilters to build and deserialize based on what versions the coordinating node thinks are running in the cluster, and that
      knowledge is always incorrect when statements are reprepared on startup and may be incorrect as all nodes reach their final version.

      Sequence of events:

      Prepared statements are persisted in system.prepared_statements to be re-prepared on future startup.

      When the 4.x node starts up after upgrade, in org.apache.cassandra.service.CassandraDaemon#setup it calls QueryProcessor.instance.preloadPreparedStatements before the Gossiper is started by a call to StorageService.instance.initServer() later in setup.

      As part of preparing statements, when possible a ColumnFilterFactory is created that returns a ColumnFilter built at the time the query is prepared.

      After the changes from CASSANDRA-16686, the ColumnFilter builder constructs different column filter variants depending on the lowest version reported in gossip by checking org.apache.cassandra.gms.Gossiper#upgradeFromVersionMemoized. If this runs before the Gossiper is enabled the SystemKeyspace.CURRENT_VERSION, causing the ColumnFilter to create a column filter as if the cluster were fully upgraded.

      For the query above, the ColumnFilter creates an ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter.

      The 3.0.x nodes participating do not understand the new flag and creates a ColumnFilter the equivalent of a WildcardColumnFilter. The 4.x nodes participating do understand the new flag, however the deserializer takes the lower than 3.4 path as other 3.0 nodes are known about and creates a WildcardColumFilter.

      The fetchedColumns sent by the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter only contains the queried static columns, however the pre-3.4 sstable iterator returns all regular and static columns, causing an IllegalStateException when the serialized response is sent back.

      The ISE clears once all nodes in the cluster think they are upgraded to the current version and behave as the originally prepared query intended.

      Related Problems

      Non-deterministic behavior of 4.0.x/4.1.x nodes

      If the prepared statements are cleared and/or freshly prepared when the cluster is in mixed 3.0/4.0 mode, the pre-built ColumnFilter will remain in the mixed mode version until re-prepared on a restart or cache clear/eviction.

      As upgradeFromVersionMemoized times out and is recalculated after the upgrade reaches a single version, individual nodes will make a local decision on column filter building and deserializing.

      Nodes that update upgradeFromVersionMemoized early that coordinate requests may cause the same ISE against nodes responding to the read command have the previous version still.

      Digest Mismatches

      If ALL_REGULARS_AND_QUERIED_STATICS_COLUMN ColumnFilter s are incorrectly sent to 3.0.x nodes, the list of columns included will be ignored and compute a different digest than one locally executed on a 4.0.x coordinator.

      Proposed fix

      In discussion with Alex Petrov, he suggested that the one way to resolve this is the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter should by deprecated (or just removed) and no longer built, always selecting all static columns
      This would just leave WildCardColumnFilter and SelectionColumnFilter with ALL_COLUMNS or ONLY_QUERIED_COLUMNS.

      This is a potential performance regression for unusual schemas with very large numbers of static columns, but seems unlikely in practice.

      /cc: Benjamin Lerer 

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jonmeredith Jon Meredith Assign to me
            jonmeredith Jon Meredith
            Jon Meredith
            Benjamin Lerer

            Dates

              Created:
              Updated:

              Slack

                Issue deployment