Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Duplicate
-
None
-
None
-
Normal
Description
CREATE TABLE bulks_recipients ( bulk_id uuid, recipient text, bulk_id_idx uuid, PRIMARY KEY ((bulk_id, recipient)) )
bulk_id_idx is just a copy of bulk_id because SASI does not work on partition key component at all for some reason.
CREATE CUSTOM INDEX bulks_recipients_bulk_id ON bulks_recipients (bulk_id_idx) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Then i insert 1 million rows with the same bulk_id and different recipient. Then
> select count(*) from bulks_recipients ; count --------- 1000000 (1 rows)
Ok, it's fine here. Now let's query by SASI:
> select count(*) from bulks_recipients where bulk_id_idx = fedd95ec-2cc8-4040-8619-baf69647700b; count --------- 1010101 (1 rows)
Hmm, very strange count - 10101 extra rows.
Ok, i've dumped the query result into a text file:
# cat sasi.txt | wc -l 1000200
Here we have 200 extra rows for some reason.
Let's check if these are duplicates:
# cat sasi.txt | sort | uniq | wc -l 1000000
Yep, looks like.
Recreating index does not help. If i issue the very same query (against partition key bulk_id, not bulk_id_idx) - i get correct results.
Attachments
Issue Links
- is duplicated by
-
CASSANDRA-13302 last row of previous page == first row of next page while querying data using SASI index
- Resolved