Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-18782

Forbid analyzed SAI indexes on primary key columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 5.0-alpha1, 5.0, 5.1
    • Feature/SAI
    • None

    Description

      Queries using SAI indexes don't find any results when the index is on a primary key column, the indexing uses analysis, and the queried value is different to the exact value of the column. For example:

      CREATE TABLE t(k int, c text, PRIMARY KEY (k, c));
      CREATE INDEX ON t(c) USING 'sai' WITH OPTIONS = { 'case_sensitive' : false };
      INSERT INTO t(k, c) VALUES (1, 'A');
      SELECT * FROM t WHERE c = 'a'; -- no results found!!!
      

      This happens because the ClusteringIndexFilter for the query doesn't take analysis into account. Thus, when that filter is applied by QueryController#doesNotSelect(PrimaryKey) it rejects the results that have been correctly found by the index.

      An initial approach to solve this problem could be making ClusteringIndexFilter aware of the index analysis options. However, this would be problematic for paging. The first page of the query contains a restriction in the clustering that requires analysis. But subsequent queries will contain the last seen clustering, and we don’t want analysis in that case.

      Another approach would be not adding a ClusteringIndexFilter to the query restrictions when it contains this type of restriction on columns. However, this approach would create a weird situation where adding an index might make ALLOW FILTERING necessary in queries that wouldn’t need it without the index. This is the opposite of the natural way of things, where more indexes mean less AF needed. For example:

      CREATE TABLE t(k int, c1 text, c2 int, PRIMARY KEY (k, c1, c2));
      CREATE INDEX idx ON t(c1) USING 'sai' WITH OPTIONS = { 'case_sensitive' : false };
      SELECT * FROM t WHERE k = 0 AND c1 = 'a' AND c2 = 0 ALLOW FILTERING;
      

      The query would need AF because it has been translated into an index query without a clustering filter, and c2 is not indexed.

      I think there is an ambiguity in the query, and it's not clear if it should use the secondary index filter and use analysis, or it should be a primary index query and not use analysis. Although we can default to one or another interpretation, both can serve different use cases. We will probably need some new CQL syntax to allow users to specify whether they want to use the secondary index or not.

      We can work on those CQL improvements during the second phase of SAI. In the meantime, I think we should simply forbid the creation of analyzed indexes on primary key columns.

      Attachments

        Issue Links

          Activity

            People

              adelapena Andres de la Peña
              adelapena Andres de la Peña
              Andres de la Peña
              Caleb Rackliffe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m