Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Duplicate
-
None
-
None
-
Critical
Description
We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests with "LIKE '%foo%bar%'" constraints on a column with SASI index.
Below are few experiments that show this behaviour.
Experiment 1:
drop keyspace if exists kmv; create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ; use kmv; CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' }; insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%';
Expected result: qweasd, qwea1.
Actual result: no rows.
Experiment 2 (NOTE: definition of index is changed):
drop keyspace if exists kmv; create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ; use kmv; CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text); CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%';
Expected result: qweasd, qwea1.
Actual result: asdqwe, qweasd, qwea1.
Experiment 3 (NOTE: primary key is compound now and inserted data was changed):
drop keyspace if exists kmv; create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ; use kmv; CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1)); CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w%a%';
Expected result: qweasd, qwea1.
Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
Experiment 4 (NOTE: search criteria is changed):
drop keyspace if exists kmv; create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ; use kmv; CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1)); CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true' }; insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ; insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ; insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ; insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ; insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ; select c2 from kmv.kmv where c2 like '%w22%a%';
Expected result: no rows.
Actual result: qweasd, qwea1, asdqwe.
Attachments
Issue Links
- is duplicated by
-
CASSANDRA-12674 [SASI] Confusing AND/OR semantics for StandardAnalyzer
- Open
- relates to
-
CASSANDRA-12675 SASI index. Support for '%' as a wildcard in the middle of LIKE pattern string.
- Open