Description
I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also cassandra-3.0 branch) where subsequent rows were lost within a partition where there is a row tombstone present.
Here's a scenario that reproduces the issue.
Using ccm create a single node cluster at 2.1.11:
ccm create -n 1 -v 2.1.11 -s financial
Run the following queries to create schema, populate some data and then delete some data for november:
drop keyspace if exists financial; create keyspace if not exists financial with replication = {'class': 'SimpleStrategy', 'replication_factor' : 1 }; create table if not exists financial.symbol_history ( symbol text, name text static, year int, month int, day int, volume bigint, close double, open double, low double, high double, primary key((symbol, year), month, day) ) with CLUSTERING ORDER BY (month desc, day desc); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 1, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 2, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 3, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 4, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 5, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 6, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 7, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 8, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 9, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 10, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 11, 1, 100); insert into financial.symbol_history (symbol, name, year, month, day, volume) values ('CORP', 'MegaCorp', 2004, 12, 1, 100); delete from financial.symbol_history where symbol='CORP' and year = 2004 and month=11;
Flush and run sstable2json on the sole Data.db file:
ccm node1 flush sstable2json /path/to/file.db
The output should look like the following:
[ {"key": "CORP:2004", "cells": [["::name","MegaCorp",1449457517033030], ["12:1:","",1449457517033030], ["12:1:volume","100",1449457517033030], ["11:_","11:!",1449457564983269,"t",1449457564], ["10:1:","",1449457516313738], ["10:1:volume","100",1449457516313738], ["9:1:","",1449457516310205], ["9:1:volume","100",1449457516310205], ["8:1:","",1449457516235664], ["8:1:volume","100",1449457516235664], ["7:1:","",1449457516233535], ["7:1:volume","100",1449457516233535], ["6:1:","",1449457516231458], ["6:1:volume","100",1449457516231458], ["5:1:","",1449457516228307], ["5:1:volume","100",1449457516228307], ["4:1:","",1449457516225415], ["4:1:volume","100",1449457516225415], ["3:1:","",1449457516222811], ["3:1:volume","100",1449457516222811], ["2:1:","",1449457516220301], ["2:1:volume","100",1449457516220301], ["1:1:","",1449457516210758], ["1:1:volume","100",1449457516210758]]} ]
Prepare for upgrade
ccm node1 nodetool snapshot financial ccm node1 nodetool drain ccm node1 stop
Upgrade to cassandra-3.0 and start the node
ccm node1 setdir -v git:cassandra-3.0 ccm node1 start
Run command in cqlsh and observe only 1 row is returned! It appears that all data following november is gone.
cqlsh> select * from financial.symbol_history; symbol | year | month | day | name | close | high | low | open | volume --------+------+-------+-----+----------+-------+------+------+------+-------- CORP | 2004 | 12 | 1 | MegaCorp | null | null | null | null | 100
Upgrade sstables and query again and you'll observe the same problem.
ccm node1 nodetool upgradesstables financial
I modified the 2.2 version of sstable2json so that it works with 3.0 (couldn't help myself ), and observed 2 RangeTombstoneBoundMarker occurrences for 1 delete and the rest of the data missing.
[ { "key": "CORP:2004", "static": { "cells": { ["name","MegaCorp",1449457517033030] } }, "rows": [ { "clustering": {"month": "12", "day": "1"}, "cells": { ["volume","100",1449457517033030] } }, { "tombstone": ["11:*",1449457564983269,"t",1449457564] }, { "tombstone": ["11:*",1449457564983269,"t",1449457564] } ] } ]
I'm not sure why this is happening, but I should point out that I'm using static columns here and that I'm using reverse order for my clustering, so maybe that makes a difference. I'll try without static columns / regular ordering to see if that makes a difference and update the ticket.