Queries using SAI indexes don't find any results when the index is on a primary key column, the indexing uses analysis, and the queried value is different to the exact value of the column. For example:
This happens because the ClusteringIndexFilter for the query doesn't take analysis into account. Thus, when that filter is applied by QueryController#doesNotSelect(PrimaryKey) it rejects the results that have been correctly found by the index.
An initial approach to solve this problem could be making ClusteringIndexFilter aware of the index analysis options. However, this would be problematic for paging. The first page of the query contains a restriction in the clustering that requires analysis. But subsequent queries will contain the last seen clustering, and we don’t want analysis in that case.
Another approach would be not adding a ClusteringIndexFilter to the query restrictions when it contains this type of restriction on columns. However, this approach would create a weird situation where adding an index might make ALLOW FILTERING necessary in queries that wouldn’t need it without the index. This is the opposite of the natural way of things, where more indexes mean less AF needed. For example:
The query would need AF because it has been translated into an index query without a clustering filter, and c2 is not indexed.
I think there is an ambiguity in the query, and it's not clear if it should use the secondary index filter and use analysis, or it should be a primary index query and not use analysis. Although we can default to one or another interpretation, both can serve different use cases. We will probably need some new CQL syntax to allow users to specify whether they want to use the secondary index or not.
We can work on those CQL improvements during the second phase of SAI. In the meantime, I think we should simply forbid the creation of analyzed indexes on primary key columns.