Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13379

SASI index returns duplicate rows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • Feature/SASI
    • None
    • Normal

    Description

      CREATE TABLE bulks_recipients (
          bulk_id uuid,
          recipient text,
          bulk_id_idx uuid,
          PRIMARY KEY ((bulk_id, recipient))
      )
      

      bulk_id_idx is just a copy of bulk_id because SASI does not work on partition key component at all for some reason.

      CREATE CUSTOM INDEX bulks_recipients_bulk_id ON bulks_recipients (bulk_id_idx) USING 'org.apache.cassandra.index.sasi.SASIIndex';
      

      Then i insert 1 million rows with the same bulk_id and different recipient. Then

      > select count(*) from bulks_recipients ;
      
       count
      ---------
       1000000
      
      (1 rows)
      

      Ok, it's fine here. Now let's query by SASI:

      > select count(*) from bulks_recipients where bulk_id_idx = fedd95ec-2cc8-4040-8619-baf69647700b;
      
       count
      ---------
       1010101
      
      (1 rows)
      

      Hmm, very strange count - 10101 extra rows.
      Ok, i've dumped the query result into a text file:

      # cat sasi.txt | wc -l
      1000200
      

      Here we have 200 extra rows for some reason.

      Let's check if these are duplicates:

      # cat sasi.txt | sort | uniq | wc -l
      1000000
      

      Yep, looks like.

      Recreating index does not help. If i issue the very same query (against partition key bulk_id, not bulk_id_idx) - i get correct results.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              blind_oracle Igor Novgorodov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: