Details
-
Bug
-
Status: Resolved
-
High
-
Resolution: Fixed
-
None
-
Correctness - Transient Incorrect Response
-
Normal
-
Normal
-
User Report
-
Description
Hello
During an upgrade from 2.1.17 to 3.11.4, our application starting getting back incomplete results for range queries. When all nodes were upgraded (before upgrading sstables), we stopped getting incomplete results. I was able to reproduce it and listed steps below. It seems to require the random partitioner and compact storage to reproduce reliably. It also reproduces coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old node is your coordinator and it has to talk to an upgraded replica.
ccm create test -v 2.1.17 -n 3 ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' ccm node1 updateconf 'initial_token: 0' ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' ccm start ccm node1 cqlsh <<SCHEMA CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3}; CREATE COLUMNFAMILY test.test ( id text, foo text, bar text, PRIMARY KEY (id) ) WITH COMPACT STORAGE; CONSISTENCY QUORUM; INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); SCHEMA ccm node1 stop ccm node1 setdir -v 3.11.4 ccm node1 start ccm node2 stop ccm node2 setdir -v 3.11.4 ccm node2 start # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to # allow for simpler test setup) cqlsh 127.0.0.3 <<QUERY CONSISTENCY QUORUM; PAGING 2; select * from test.test; QUERY
This results in:
Page size: 2 id | bar | foo ----+-------+----- 2 | there | hi (1 rows)
Running it against the upgraded node (node1):
Page size: 2 id | bar | foo ----+-------+----- 2 | there | hi 1 | there | hi (2 rows)