Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14699

Querying using an indexed clustering column yields no result when a row has been reinserted using an update following a delete

    XMLWordPrintableJSON

Details

    • Normal

    Description

      If you have a secondary index on a clustering column in a table and you delete a row from said table and then add it back again using an update, querying for the row using the indexed clustering column does not yield any result.

      Dummy example to reproduce:

      CREATE TABLE foo (
          a text,
          b text,
          c text,
          d text,
          e text,
          PRIMARY KEY (a, b, c)
      );
      CREATE INDEX ON foo (b);
      CREATE INDEX ON foo (c);
      CREATE INDEX ON foo (d);
      CREATE INDEX ON foo (e);
      update foo set d='4', e='5' where a='1' and b='2' and c='3';
      delete from foo where a='1' and b='2' and c='3';
      update foo set d='4', e='5' where a='1' and b='2' and c='3';

      Queries on the indexed clustering columns, e.g.

      select * from foo where b='2';
      select * from foo where c='3';

      yield no result. Querying on the other (indexed and non-indexed) columns work fine though. 

      Here's a comparison between the dump of the index for a clustering column and the index of a non-clustering column.  As far as I can tell, the row is considered deleted in the index of b and c?

      # Index for column c
      /apache-cassandra-3.11.0/tools/bin # ./sstabledump /data/data/foo/foo-875bbb60b1ab11e8b7406d2c86545d91/.foo_b_idx/mc-1-big-Data.db
      [
        {
          "partition" : {
            "key" : [ "2" ],
            "position" : 0
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 34,
              "clustering" : [ "31", "3" ],
              "deletion_info" : { "marked_deleted" : "2018-09-06T08:05:10.093704Z", "local_delete_time" : "2018-09-06T08:05:10Z" },
              "cells" : [ ]
            }
          ]
        }
      ]
      
      # Index for d
      /apache-cassandra-3.11.0/tools/bin # ./sstabledump /data/data/foo/foo-875bbb60b1ab11e8b7406d2c86545d91/.foo_d_idx/mc-1-big-Data.db
      [
        {
          "partition" : {
            "key" : [ "4" ],
            "position" : 0
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 32,
              "clustering" : [ "31", "2", "3" ],
              "liveness_info" : { "tstamp" : "2018-09-06T08:05:13.986242Z" },
              "cells" : [ ]
            }
          ]
        }
      ]

      This problem only occurs when the delete is followed by an update. If you instead use an insert, e.g.

      update foo set d='4', e='5' where a='1' and b='2' and c='3';
      delete from foo where a='1' and b='2' and c='3';
      insert into foo (a, b, c, d, e) VALUES ('1', '2', '3', '4', '5');

      all queries work and the dump for the indexed clustering columns look fine as far as I can tell:

      [
        {
          "partition" : {
            "key" : [ "2" ],
            "position" : 0
          },
          "rows" : [
            {
              "type" : "row",
              "position" : 41,
              "clustering" : [ "31", "3" ],
              "liveness_info" : { "tstamp" : "2018-09-06T08:21:20.546530Z" },
              "deletion_info" : { "marked_deleted" : "2018-09-06T08:21:11.027171Z", "local_delete_time" : "2018-09-06T08:21:11Z" },
              "cells" : [ ]
            }
          ]
        }
      ]

      I was able to reproduce this problem in both 3.11.0 and 3.11.3.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jonathan.pellby Jonathan Pellby
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: