Description
I found that rows are splitting and duplicated after upgrading the cluster from 2.1.x to 3.0.x.
I found the way to reproduce the problem as below.
$ ccm create test -v 2.1.16 -n 3 -s Current cluster is now: test $ ccm node1 cqlsh -e "CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}" $ ccm node1 cqlsh -e "CREATE TABLE test.test (id text PRIMARY KEY, value1 set<text>, value2 set<text>);" # Upgrade node1 $ for i in 1; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done # Insert a row through node1(3.0.10) $ ccm node1 cqlsh -e "INSERT INTO test.test (id, value1, value2) values ('aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});" # Insert a row through node2(2.1.16) $ ccm node2 cqlsh -e "INSERT INTO test.test (id, value1, value2) values ('bbb', {'aaa', 'bbb'}, {'ccc', 'ddd'});" # The row inserted from node1 is splitting $ ccm node1 cqlsh -e "SELECT * FROM test.test ;" id | value1 | value2 -----+----------------+---------------- aaa | null | null aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'} bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'} $ for i in 1 2; do ccm node${i} nodetool flush; done # Results of sstable2json of node2. The row inserted from node1(3.0.10) is different from the row inserted from node2(2.1.16). $ ccm node2 json -k test -c test running ['/home/zzheng/.ccm/test/node2/data0/test/test-5406ee80dbdb11e6a175f57c4c7c85f3/test-test-ka-1-Data.db'] -- test-test-ka-1-Data.db ----- [ {"key": "aaa", "cells": [["","",1484564624769577], ["value1","value2:!",1484564624769576,"t",1484564624], ["value1:616161","",1484564624769577], ["value1:626262","",1484564624769577], ["value2:636363","",1484564624769577], ["value2:646464","",1484564624769577]]}, {"key": "bbb", "cells": [["","",1484564634508029], ["value1:_","value1:!",1484564634508028,"t",1484564634], ["value1:616161","",1484564634508029], ["value1:626262","",1484564634508029], ["value2:_","value2:!",1484564634508028,"t",1484564634], ["value2:636363","",1484564634508029], ["value2:646464","",1484564634508029]]} ] # Upgrade node2,3 $ for i in `seq 2 3`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done # After upgrade node2,3, the row inserted from node1 is splitting in node2,3 $ ccm node2 cqlsh -e "SELECT * FROM test.test ;" id | value1 | value2 -----+----------------+---------------- aaa | null | null aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'} bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'} (3 rows) # Results of sstabledump # node1 [ { "partition" : { "key" : [ "aaa" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 17, "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" }, "cells" : [ { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } }, { "name" : "value1", "path" : [ "aaa" ], "value" : "" }, { "name" : "value1", "path" : [ "bbb" ], "value" : "" }, { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } }, { "name" : "value2", "path" : [ "ccc" ], "value" : "" }, { "name" : "value2", "path" : [ "ddd" ], "value" : "" } ] } ] }, { "partition" : { "key" : [ "bbb" ], "position" : 48 }, "rows" : [ { "type" : "row", "position" : 65, "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" }, "cells" : [ { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } }, { "name" : "value1", "path" : [ "aaa" ], "value" : "" }, { "name" : "value1", "path" : [ "bbb" ], "value" : "" }, { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } }, { "name" : "value2", "path" : [ "ccc" ], "value" : "" }, { "name" : "value2", "path" : [ "ddd" ], "value" : "" } ] } ] } ] # node2 [ { "partition" : { "key" : [ "aaa" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 17, "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" }, "cells" : [ ] }, { "type" : "row", "position" : 22, "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" }, "cells" : [ { "name" : "value1", "path" : [ "aaa" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" }, { "name" : "value1", "path" : [ "bbb" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" }, { "name" : "value2", "path" : [ "ccc" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" }, { "name" : "value2", "path" : [ "ddd" ], "value" : "", "tstamp" : "2017-01-16T11:03:44.769577Z" } ] } ] }, { "partition" : { "key" : [ "bbb" ], "position" : 57 }, "rows" : [ { "type" : "row", "position" : 74, "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" }, "cells" : [ { "name" : "value1", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } }, { "name" : "value1", "path" : [ "aaa" ], "value" : "" }, { "name" : "value1", "path" : [ "bbb" ], "value" : "" }, { "name" : "value2", "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } }, { "name" : "value2", "path" : [ "ccc" ], "value" : "" }, { "name" : "value2", "path" : [ "ddd" ], "value" : "" } ] } ] } ]
Another example of row splitting is as follows.
$ ccm create test2 -v 2.1.16 -n 3 -s Current cluster is now: test2 $ ccm node1 cqlsh -e "CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}" $ ccm node1 cqlsh -e "CREATE TABLE test.text_set_set (id text PRIMARY KEY, value1 text, value2 set<text>, value3 set<text>);" $ for i in `seq 1`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm node${i} start;ccm node${i} nodetool upgradesstables; done $ ccm node1 cqlsh -e "INSERT INTO test.text_set_set (id, value1, value2, value3) values ('aaa', 'aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});" $ ccm node1 cqlsh -e "SELECT * FROM test.text_set_set;" id | value1 | value2 | value3 -----+--------+----------------+---------------- aaa | aaa | null | null aaa | null | {'aaa', 'bbb'} | {'ccc', 'ddd'} (2 rows)
As far as I investigated, the occurrence conditions are as follows.
- Table schema contains multiple collections.
- Insert a row, which values of the collection column are not null through 3.x node while both 2.1 and 3.x nodes exist in a cluster.
- Rows in sstables of node which version was 2.1 at the time the row was inserted is splitting after upgrading to 3.x.
Thanks.
Attachments
Attachments
Issue Links
- relates to
-
CASSANDRA-11887 Duplicate rows after a 2.2.5 to 3.0.4 migration
- Resolved
-
CASSANDRA-12144 Undeletable / duplicate rows after upgrading from 2.2.4 to 3.0.7
- Resolved