Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8723

Bad interaction bewteen WordDelimiterGraphFilter, StopFilter and FlattenGraphFilter

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.7.1, 8.0, 8.3
    • 9.0, 8.10
    • modules/analysis
    • None
    • New

    Description

      I was debugging an issue (missing tokens after analysis) and when I enabled Java assertions I uncovered a bug when using WordDelimiterGraphFilter + StopFilter + FlattenGraphFilter.

      I could reproduce the issue in a small piece of code. This code gives an assertion failure when assertions are enabled (-ea java option):

          Builder builder = CustomAnalyzer.builder();
          builder.withTokenizer(StandardTokenizerFactory.class);
          builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, "preserveOriginal", "1");
          builder.addTokenFilter(StopFilterFactory.class);
          builder.addTokenFilter(FlattenGraphFilterFactory.class);
          Analyzer analyzer = builder.build();
           
          TokenStream ts = analyzer.tokenStream("*", new StringReader("x7in"));
          ts.reset();
          while(ts.incrementToken())
              ;
      

      This gives:

      Exception in thread "main" java.lang.AssertionError: 2
           at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
           at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
           at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
      

      Maybe removing stop words after WordDelimiterGraphFilter is wrong, I don't know. However is the only way to process stop-words generated by that filter. In any case, it should not eat tokens or produce assertions. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              niqueco Nicolás Lichtmaier
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: