Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11130

[SASI Pre-QA] = semantics not respected when using StandardAnalyzer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 3.4
    • Component/s: Legacy/CQL
    • Labels:
      None
    • Environment:

      Tested from build CASSANDRA-11067

    • Severity:
      Normal

      Description

      Tested from build CASSANDRA-11067

      CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
      
      CREATE TABLE music.albums (
          id int PRIMARY KEY,
          artist text,
          title1 text,
          title2 text
      );
      
      CREATE CUSTOM INDEX ON music.albums (title1) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};
      
      CREATE CUSTOM INDEX ON music.albums (title2) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
      
      INSERT INTO music.albums(id, artist, title1, title2) 
      VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
      
      SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
      
       artist                 | title1
      ------------------------+----------------
                 Superpitcher |       Yesterday
                  Hilary Duff |    So Yesterday
         The Mr. T Experience | Yesterday Rules
       
      (3 rows)
      
      SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
      
      artist                 | title1
      ------------------------+----------------
                 Superpitcher |       Yesterday
                  Hilary Duff |    So Yesterday
         The Mr. T Experience | Yesterday Rules
        
      (3 rows)
      

      The semantic of = is not respected. SASI should return only 1 row with exact match. Using LIKE would return all 3 rows. It does impact both PREFIX and CONTAINS mode. Using NonTokenizerAnalyzer return 1 row with exact match.

      So indeed, the semantics of = depends on the chosen analyzer, which is inconsistent. We should force = to be exact match no matter which analyzer is chosen.

        Attachments

          Activity

            People

            • Assignee:
              xedin Pavel Yaskevich
              Reporter:
              doanduyhai DuyHai Doan
              Authors:
              Pavel Yaskevich
              Reviewers:
              Sam Tunnicliffe
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: