Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-12078

[SASI] Move skip_stop_words filter BEFORE stemming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 3.8
    • Feature/SASI
    • None
    • Cassandra 3.7, Cassandra 3.8

    • Normal
    • 3.7

    Description

      Right now, if skip stop words and stemming are enabled, SASI will put stemming in the filter pipeline BEFORE skip_stop_words:

          private FilterPipelineTask getFilterPipeline()
          {
              FilterPipelineBuilder builder = new FilterPipelineBuilder(new BasicResultFilters.NoOperation());
           ...
              if (options.shouldStemTerms())
                  builder = builder.add("term_stemming", new StemmingFilters.DefaultStemmingFilter(options.getLocale()));
              if (options.shouldIgnoreStopTerms())
                  builder = builder.add("skip_stop_words", new StopWordFilters.DefaultStopWordFilter(options.getLocale()));
              return builder.build();
          }
      

      The problem is that stemming before removing stop words can yield wrong results.

      I have an example:

      SELECT * FROM music.albums WHERE country='France' AND title LIKE 'danse' ALLOW FILTERING;
      

      Because of stemming danse ( dance in English) becomes dans (the final vowel is removed). Then skip stop words is applied. Unfortunately dans (in in English) is a stop word in French so it is removed completely.

      In the end the query is equivalent to SELECT * FROM music.albums WHERE country='France' and of course the results are wrong.

      Attached is a trivial patch to move the skip_stop_words filter BEFORE stemming filter

      /cc xedin jrwest beobal

      Attachments

        1. patch.txt
          1 kB
          DuyHai Doan
        2. patch_V2.txt
          4 kB
          DuyHai Doan

        Activity

          People

            doanduyhai DuyHai Doan
            doanduyhai DuyHai Doan
            DuyHai Doan
            Pavel Yaskevich
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: