Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12078

[SASI] Move skip_stop_words filter BEFORE stemming

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 3.8
    • Component/s: Feature/SASI
    • Labels:
      None
    • Environment:

      Cassandra 3.7, Cassandra 3.8

    • Severity:
      Normal
    • Since Version:
      3.7

      Description

      Right now, if skip stop words and stemming are enabled, SASI will put stemming in the filter pipeline BEFORE skip_stop_words:

          private FilterPipelineTask getFilterPipeline()
          {
              FilterPipelineBuilder builder = new FilterPipelineBuilder(new BasicResultFilters.NoOperation());
           ...
              if (options.shouldStemTerms())
                  builder = builder.add("term_stemming", new StemmingFilters.DefaultStemmingFilter(options.getLocale()));
              if (options.shouldIgnoreStopTerms())
                  builder = builder.add("skip_stop_words", new StopWordFilters.DefaultStopWordFilter(options.getLocale()));
              return builder.build();
          }
      

      The problem is that stemming before removing stop words can yield wrong results.

      I have an example:

      SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' ALLOW FILTERING;
      

      Because of stemming danse ( dance in English) becomes dans (the final vowel is removed). Then skip stop words is applied. Unfortunately dans (in in English) is a stop word in French so it is removed completely.

      In the end the query is equivalent to SELECT * FROM music.albums WHERE country='France' and of course the results are wrong.

      Attached is a trivial patch to move the skip_stop_words filter BEFORE stemming filter

      /cc Pavel Yaskevich Jordan West beobal

        Attachments

        1. patch_V2.txt
          4 kB
          DuyHai Doan
        2. patch.txt
          1 kB
          DuyHai Doan

          Activity

            People

            • Assignee:
              doanduyhai DuyHai Doan
              Reporter:
              doanduyhai DuyHai Doan
              Authors:
              DuyHai Doan
              Reviewers:
              Pavel Yaskevich
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: