Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10436

Index selection should be weighted in favour of custom expressions



    • Type: Improvement
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 3.0.0 rc2
    • Component/s: Legacy/CQL
    • Labels:


      If a SELECT contains a custom index expression (CASSANDRA-10217), that should always be chosen as the primary expression during query execution. Should the statement contain other expressions which can be satsfied by a built in index, we don't currently have the ability to apply the custom expression as a filter. What's more, the method of selecting which index to use is fairly primitive (and cannot be overridden until CASSANDRA-10214), so we should ensure that a custom expression, if present, is always chosen.

      Suppose we have a custom index implementation which provides prefix matching on text fields.

      CREATE TABLE ks.t (k int, v1 int, v2 text, PRIMARY KEY(k));
      CREATE INDEX v1_idx ON ks.t(v1);
      CREATE CUSTOM INDEX v2_idx ON ks.t(v2) USING 'com.example.CustomIndex';
      INSERT INTO ks.t(k, v1, v2) VALUES(0, 0, 'abc');
      INSERT INTO ks.t(k, v1, v2) VALUES(1, 1, 'def');
      SELECT * FROM ks.t WHERE v1=0 AND expr(v2_idx, 'd*') ALLOW FILTERING;

      In the above example the expected result would contain no rows, which would be the case if v2_idx is selected as the primary (i.e. most selective) index during query execution. However, if v1_idx is chosen instead, the results of its lookup will have no further filter applied and so an incorrect result will be returned.

      Note: this has always been something of an issue for custom indexes as the expressions they support may not be natively filterable by C*. For example, with the full text search syntax used by Stratio & DSE Search, if the custom index isn't selected the filtering will erroneously remove all rows as the value of the dummy column does not match the Lucene/Solr search expression literal. It's probably a fairly minor concern as in most cases a query using a custom index will not include other expressions (usually because custom indexes are per-row indexes, and so can support multi-field expression syntax). Also, an index implementation can return a very low number of estimated result count to try and ensure it is selected, custom expressions just provide an opportunity to improve the situation.




            • Assignee:
              samt Sam Tunnicliffe
              samt Sam Tunnicliffe
              Sam Tunnicliffe
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: