Lucene - Core
  1. Lucene - Core
  2. LUCENE-4826

PostingsHighlighter doesn't keep the top N best scoring passages

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 4.2.1, Trunk
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The comparator we pass to the PQ is just backwards ...

      1. LUCENE-4826.patch
        88 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        +1!

        Here is a smaller test: in order to trick it to fail, you must have something like
        Great Sentence. Crappy Sentence. Good Sentence.

        otherwise they never make it into the PQ to demonstrate the bug...

          public void testPassageRanking() throws Exception {
            Directory dir = newDirectory();
            IndexWriterConfig iwc = newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random(), MockTokenizer.SIMPLE, true));
            iwc.setMergePolicy(newLogMergePolicy());
            RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwc);
            
            FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
            offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            Field body = new Field("body", "", offsetsType);
            Document doc = new Document();
            doc.add(body);
            
            body.setStringValue("This is a test.  Just highlighting from postings. This is also a much sillier test.  Feel free to test test test test test test test.");
            iw.addDocument(doc);
            
            IndexReader ir = iw.getReader();
            iw.close();
            
            IndexSearcher searcher = newSearcher(ir);
            PostingsHighlighter highlighter = new PostingsHighlighter();
            Query query = new TermQuery(new Term("body", "test"));
            TopDocs topDocs = searcher.search(query, null, 10, Sort.INDEXORDER);
            assertEquals(1, topDocs.totalHits);
            String snippets[] = highlighter.highlight("body", query, searcher, topDocs, 2);
            assertEquals(1, snippets.length);
            assertEquals("This is a <b>test</b>.  ... Feel free to <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b>.", snippets[0]);
            
            ir.close();
            dir.close();
          }
        
        Show
        Robert Muir added a comment - +1! Here is a smaller test: in order to trick it to fail, you must have something like Great Sentence. Crappy Sentence. Good Sentence. otherwise they never make it into the PQ to demonstrate the bug... public void testPassageRanking() throws Exception { Directory dir = newDirectory(); IndexWriterConfig iwc = newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random(), MockTokenizer.SIMPLE, true )); iwc.setMergePolicy(newLogMergePolicy()); RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwc); FieldType offsetsType = new FieldType(TextField.TYPE_STORED); offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); Field body = new Field( "body" , "", offsetsType); Document doc = new Document(); doc.add(body); body.setStringValue( "This is a test. Just highlighting from postings. This is also a much sillier test. Feel free to test test test test test test test." ); iw.addDocument(doc); IndexReader ir = iw.getReader(); iw.close(); IndexSearcher searcher = newSearcher(ir); PostingsHighlighter highlighter = new PostingsHighlighter(); Query query = new TermQuery( new Term( "body" , "test" )); TopDocs topDocs = searcher.search(query, null , 10, Sort.INDEXORDER); assertEquals(1, topDocs.totalHits); String snippets[] = highlighter.highlight( "body" , query, searcher, topDocs, 2); assertEquals(1, snippets.length); assertEquals( "This is a <b>test</b>. ... Feel free to <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b> <b>test</b>." , snippets[0]); ir.close(); dir.close(); }
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1455785

        LUCENE-4826: add mime-type and license exclusion for test data file

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1455785 LUCENE-4826 : add mime-type and license exclusion for test data file
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1455696

        LUCENE-4826: fix PostingsHighlighter PassageQueue comparator so we keep the best 2 passages

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1455696 LUCENE-4826 : fix PostingsHighlighter PassageQueue comparator so we keep the best 2 passages
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1455784

        LUCENE-4826: add mime-type and license exclusion for test data file

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1455784 LUCENE-4826 : add mime-type and license exclusion for test data file
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1455693

        LUCENE-4826: fix PostingsHighlighter PassageQueue comparator so we keep the best 2 passages

        Show
        Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1455693 LUCENE-4826 : fix PostingsHighlighter PassageQueue comparator so we keep the best 2 passages
        Hide
        Robert Muir added a comment -

        I'll backport this one to 4.2.1

        Show
        Robert Muir added a comment - I'll backport this one to 4.2.1
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development