Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9963

Flatten graph filter has errors when there are holes at beginning or end of alternate paths

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 8.8
    • None
    • modules/analysis
    • None
    • New

    Description

      If asserts are enabled having gaps at the beginning or end of an alternate path can result in assertion errors

      ex: 

       

      java.lang.AssertionError: 2
      at  org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
      

       

      Or

       

      java.lang.AssertionError
      at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191)
      

       

       

      If asserts are not enabled these the same conditions will result in either IndexOutOfBounds Exceptions, or dropped tokens.

       

      java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8
      at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109)
      at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325)
      

       

      These issues can be recreated with the following unit tests

      public void testAltPathFirstStepHole() throws IOException {
       TokenStream in = new CannedTokenStream(0, 3, new Token[]{
       token("abc",1, 3, 0, 3),
       token("b",1, 1, 1, 2),
       token("c",1, 1, 2, 3)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "b", "c"},
       new int[] {0, 1, 2},
       new int[] {3, 2, 3}, 
       new int[] {1, 1, 1},
       new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening
       3);
      }
      public void testAltPathLastStepHole() throws IOException {
       TokenStream in = new CannedTokenStream(0, 4, new Token[]{
       token("abc",1, 3, 0, 3),
       token("a",0, 1, 0, 1),
       token("b",1, 1, 1, 2),
       token("d",2, 1, 3, 4)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "a", "b", "d"},
       new int[] {0, 0, 1, 3},
       new int[] {1, 1, 2, 4},
       new int[] {1, 0, 1, 2},
       new int[] {3, 1, 1, 1},
       4);
      }
      public void testAltPathLastStepHoleWithoutEndToken() throws IOException {
       TokenStream in = new CannedTokenStream(0, 2, new Token[]{
       token("abc",1, 3, 0, 3),
       token("a",0, 1, 0, 1),
       token("b",1, 1, 1, 2)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "a", "b"},
       new int[] {0, 0, 1},
       new int[] {1, 1, 2},
       new int[] {1, 0, 1},
       new int[] {1, 1, 1},
       2);
      }

      I believe Lucene-8723 is a related issue as it looks like the last token in an alternate path is being deleted.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Geoffrey Lawson Geoffrey Lawson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7.5h
                7.5h