Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15072

Incomplete range results during 2.X -> 3.11.4 upgrade

    XMLWordPrintableJSON

    Details

    • Bug Category:
      Correctness - Transient Incorrect Response
    • Severity:
      Normal
    • Complexity:
      Normal
    • Discovered By:
      User Report
    • Since Version:
    • Test and Documentation Plan:
      Hide

      circleci / in jvm upgrade dtests

      Show
      circleci / in jvm upgrade dtests

      Description

      Hello

      During an upgrade from 2.1.17 to 3.11.4, our application starting getting back incomplete results for range queries. When all nodes were upgraded (before upgrading sstables), we stopped getting incomplete results. I was able to reproduce it and listed steps below. It seems to require the random partitioner and compact storage to reproduce reliably. It also reproduces coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old node is your coordinator and it has to talk to an upgraded replica.

      ccm create test -v 2.1.17 -n 3
      ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner'
      ccm node1 updateconf 'initial_token: 0'
      ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242'
      ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484'
      ccm start
      
      ccm node1 cqlsh <<SCHEMA
      CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3};
      CREATE COLUMNFAMILY test.test (
        id text,
        foo text,
        bar text,
        PRIMARY KEY (id)
      ) WITH COMPACT STORAGE;
      CONSISTENCY QUORUM;
      INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there');
      INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there');
      SCHEMA
      
      ccm node1 stop
      ccm node1 setdir -v 3.11.4
      ccm node1 start
      
      ccm node2 stop
      ccm node2 setdir -v 3.11.4
      ccm node2 start
      
      # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to
      # allow for simpler test setup)
      cqlsh 127.0.0.3 <<QUERY
      CONSISTENCY QUORUM;
      PAGING 2;
      select * from test.test;
      QUERY
      

      This results in:

      Page size: 2
      
       id | bar   | foo
      ----+-------+-----
        2 | there |  hi
      
      (1 rows)
      

      Running it against the upgraded node (node1):

      Page size: 2
      
       id | bar   | foo
      ----+-------+-----
        2 | there |  hi
        1 | there |  hi
      
      (2 rows)
      

        Attachments

        1. eriksw-repro.sh
          2 kB
          Erik Swanson

          Activity

            People

            • Assignee:
              bdeggleston Blake Eggleston
              Reporter:
              muir Muir Manders
              Authors:
              Blake Eggleston
              Reviewers:
              Sam Tunnicliffe
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: