Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9963

Flatten graph filter has errors when there are holes at beginning or end of alternate paths

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 8.8
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If asserts are enabled having gaps at the beginning or end of an alternate path can result in assertion errors

      ex: 

       

      java.lang.AssertionError: 2
      at  org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
      

       

      Or

       

      java.lang.AssertionError
      at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191)
      

       

       

      If asserts are not enabled these the same conditions will result in either IndexOutOfBounds Exceptions, or dropped tokens.

       

      java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8
      at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109)
      at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325)
      

       

      These issues can be recreated with the following unit tests

      public void testAltPathFirstStepHole() throws IOException {
       TokenStream in = new CannedTokenStream(0, 3, new Token[]{
       token("abc",1, 3, 0, 3),
       token("b",1, 1, 1, 2),
       token("c",1, 1, 2, 3)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "b", "c"},
       new int[] {0, 1, 2},
       new int[] {3, 2, 3}, 
       new int[] {1, 1, 1},
       new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening
       3);
      }
      public void testAltPathLastStepHole() throws IOException {
       TokenStream in = new CannedTokenStream(0, 4, new Token[]{
       token("abc",1, 3, 0, 3),
       token("a",0, 1, 0, 1),
       token("b",1, 1, 1, 2),
       token("d",2, 1, 3, 4)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "a", "b", "d"},
       new int[] {0, 0, 1, 3},
       new int[] {1, 1, 2, 4},
       new int[] {1, 0, 1, 2},
       new int[] {3, 1, 1, 1},
       4);
      }
      public void testAltPathLastStepHoleWithoutEndToken() throws IOException {
       TokenStream in = new CannedTokenStream(0, 2, new Token[]{
       token("abc",1, 3, 0, 3),
       token("a",0, 1, 0, 1),
       token("b",1, 1, 1, 2)
       });
      
       TokenStream out = new FlattenGraphFilter(in);
      
       assertTokenStreamContents(out,
       new String[]{"abc", "a", "b"},
       new int[] {0, 0, 1},
       new int[] {1, 1, 2},
       new int[] {1, 0, 1},
       new int[] {1, 1, 1},
       2);
      }

      I believe Lucene-8723 is a related issue as it looks like the last token in an alternate path is being deleted.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Geoffrey Lawson Geoffrey Lawson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7.5h
                7.5h