Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7968

AnalyzingSuggester's comparator compares incorrectly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (8.0), 7.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Found by LUCENE-7966, but we should fix here separate.

      Currently the tie-break case of this comparator is buggy when hasPayloads=false, as it assigns negative lengths to the BytesRefs being compared. The current BytesRef.compareTo simply silently returns false in this case, hiding the bug.

        Issue Links

          Activity

          Hide
          rcmuir Robert Muir added a comment -

          Here's an assert that can currently make tests fail. But I want to test that the tie-break case really works, not just that stuff is in-bounds.

          diff --git a/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java b/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java
          index 9c6a624..2fa3569 100644
          --- a/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java
          +++ b/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java
          @@ -392,6 +392,8 @@ public class AnalyzingSuggester extends Lookup implements Accountable {
                   scratchB.offset = readerB.getPosition();
                   scratchA.length = a.length - scratchA.offset;
                   scratchB.length = b.length - scratchB.offset;
          +        assert scratchA.isValid();
          +        assert scratchB.isValid();
                 }
              
                 return scratchA.compareTo(scratchB);
          
             [junit4] ERROR   0.31s J2 | AnalyzingSuggesterTest.testRandom <<<
             [junit4]    > Throwable #1: java.lang.IllegalStateException: length is negative: -1507
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([D27F69F0C3A46E64:A0334CFF72C4D817]:0)
             [junit4]    > 	at org.apache.lucene.util.BytesRef.isValid(BytesRef.java:222)
             [junit4]    > 	at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester$AnalyzingComparator.compare(AnalyzingSuggester.java:395)
             [junit4]    > 	at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester$AnalyzingComparator.compare(AnalyzingSuggester.java:339)
             [junit4]    > 	at org.apache.lucene.util.BytesRefArray$1.comparePivot(BytesRefArray.java:157)
             [junit4]    > 	at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:66)
             [junit4]    > 	at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:82)
             [junit4]    > 	at org.apache.lucene.util.IntroSorter.sort(IntroSorter.java:36)
             [junit4]    > 	at org.apache.lucene.util.BytesRefArray.sort(BytesRefArray.java:166)
             [junit4]    > 	at org.apache.lucene.util.BytesRefArray.iterator(BytesRefArray.java:196)
             [junit4]    > 	at org.apache.lucene.util.OfflineSorter$SortPartitionTask.call(OfflineSorter.java:606)
             [junit4]    > 	at org.apache.lucene.util.OfflineSorter$SortPartitionTask.call(OfflineSorter.java:588)
             [junit4]    > 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             [junit4]    > 	at org.apache.lucene.util.SameThreadExecutorService.execute(SameThreadExecutorService.java:34)
             [junit4]    > 	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
             [junit4]    > 	at org.apache.lucene.util.OfflineSorter.sort(OfflineSorter.java:285)
             [junit4]    > 	at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:491)
             [junit4]    > 	at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom(AnalyzingSuggesterTest.java:787)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          
          Show
          rcmuir Robert Muir added a comment - Here's an assert that can currently make tests fail. But I want to test that the tie-break case really works, not just that stuff is in-bounds. diff --git a/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java b/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java index 9c6a624..2fa3569 100644 --- a/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java +++ b/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingSuggester.java @@ -392,6 +392,8 @@ public class AnalyzingSuggester extends Lookup implements Accountable { scratchB.offset = readerB.getPosition(); scratchA.length = a.length - scratchA.offset; scratchB.length = b.length - scratchB.offset; + assert scratchA.isValid(); + assert scratchB.isValid(); } return scratchA.compareTo(scratchB); [junit4] ERROR 0.31s J2 | AnalyzingSuggesterTest.testRandom <<< [junit4] > Throwable #1: java.lang.IllegalStateException: length is negative: -1507 [junit4] > at __randomizedtesting.SeedInfo.seed([D27F69F0C3A46E64:A0334CFF72C4D817]:0) [junit4] > at org.apache.lucene.util.BytesRef.isValid(BytesRef.java:222) [junit4] > at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester$AnalyzingComparator.compare(AnalyzingSuggester.java:395) [junit4] > at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester$AnalyzingComparator.compare(AnalyzingSuggester.java:339) [junit4] > at org.apache.lucene.util.BytesRefArray$1.comparePivot(BytesRefArray.java:157) [junit4] > at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:66) [junit4] > at org.apache.lucene.util.IntroSorter.quicksort(IntroSorter.java:82) [junit4] > at org.apache.lucene.util.IntroSorter.sort(IntroSorter.java:36) [junit4] > at org.apache.lucene.util.BytesRefArray.sort(BytesRefArray.java:166) [junit4] > at org.apache.lucene.util.BytesRefArray.iterator(BytesRefArray.java:196) [junit4] > at org.apache.lucene.util.OfflineSorter$SortPartitionTask.call(OfflineSorter.java:606) [junit4] > at org.apache.lucene.util.OfflineSorter$SortPartitionTask.call(OfflineSorter.java:588) [junit4] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] > at org.apache.lucene.util.SameThreadExecutorService.execute(SameThreadExecutorService.java:34) [junit4] > at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) [junit4] > at org.apache.lucene.util.OfflineSorter.sort(OfflineSorter.java:285) [junit4] > at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:491) [junit4] > at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggesterTest.testRandom(AnalyzingSuggesterTest.java:787) [junit4] > at java.lang.Thread.run(Thread.java:745)
          Hide
          rcmuir Robert Muir added a comment -

          Patch, including a test that fails even without the assert. It just adds 50 surface forms with the same weight that all analyze to the same string, and asserts that lookup() returns them back in sorted order by surface form.

          When payloads=false, the BytesRef's length should be the remaining bytes, not negative.

          Show
          rcmuir Robert Muir added a comment - Patch, including a test that fails even without the assert. It just adds 50 surface forms with the same weight that all analyze to the same string, and asserts that lookup() returns them back in sorted order by surface form. When payloads=false, the BytesRef's length should be the remaining bytes, not negative.
          Hide
          rcmuir Robert Muir added a comment -

          From what I can tell, since the stuff going into the FST (analyzed form, costs) is still in-order in this case, nothing detected it.

          The surface forms are stored in a big byte[], so by being out of order it means suggester's results just come back in a bizarre order when there are ties on both the analyzed form and costs (rather than in fact being sorted by surface form).

          E.G. if you used a stemmer and added both dog (cost=2) and dogs (cost=2), suggester might sometimes return dog first, other times dogs.

          Show
          rcmuir Robert Muir added a comment - From what I can tell, since the stuff going into the FST (analyzed form, costs) is still in-order in this case, nothing detected it. The surface forms are stored in a big byte[], so by being out of order it means suggester's results just come back in a bizarre order when there are ties on both the analyzed form and costs (rather than in fact being sorted by surface form). E.G. if you used a stemmer and added both dog (cost=2) and dogs (cost=2) , suggester might sometimes return dog first, other times dogs .
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2e5f9a4369e0f4f664868718ce3ee8fbea43a98b in lucene-solr's branch refs/heads/master from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e5f9a4 ]

          LUCENE-7968: Analyzing Suggester's comparator compares incorrectly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2e5f9a4369e0f4f664868718ce3ee8fbea43a98b in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e5f9a4 ] LUCENE-7968 : Analyzing Suggester's comparator compares incorrectly
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c12acde72bc585409fe0f5738f3d9f6dbec05d20 in lucene-solr's branch refs/heads/branch_7x from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c12acde ]

          LUCENE-7968: Analyzing Suggester's comparator compares incorrectly

          Show
          jira-bot ASF subversion and git services added a comment - Commit c12acde72bc585409fe0f5738f3d9f6dbec05d20 in lucene-solr's branch refs/heads/branch_7x from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c12acde ] LUCENE-7968 : Analyzing Suggester's comparator compares incorrectly

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development