Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Tested from build CASSANDRA-11067
-
Normal
Description
Tested from build CASSANDRA-11067
CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; CREATE TABLE music.albums ( id int PRIMARY KEY, artist text, title1 text, title2 text ); CREATE CUSTOM INDEX ON music.albums (title1) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'}; CREATE CUSTOM INDEX ON music.albums (title2) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'}; INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday'); INSERT INTO music.albums(id, artist, title1, title2) VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday'); INSERT INTO music.albums(id, artist, title1, title2) VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules'); SELECT artist,title1 FROM music.albums WHERE title1='Yesterday'; artist | title1 ------------------------+---------------- Superpitcher | Yesterday Hilary Duff | So Yesterday The Mr. T Experience | Yesterday Rules (3 rows) SELECT artist,title1 FROM music.albums WHERE title2='Yesterday'; artist | title1 ------------------------+---------------- Superpitcher | Yesterday Hilary Duff | So Yesterday The Mr. T Experience | Yesterday Rules (3 rows)
The semantic of = is not respected. SASI should return only 1 row with exact match. Using LIKE would return all 3 rows. It does impact both PREFIX and CONTAINS mode. Using NonTokenizerAnalyzer return 1 row with exact match.
So indeed, the semantics of = depends on the chosen analyzer, which is inconsistent. We should force = to be exact match no matter which analyzer is chosen.