Lucene - Core
  1. Lucene - Core
  2. LUCENE-4290

basic highlighter that uses postings offsets

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, Trunk
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this.

      Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests)

      Additionally I didnt do any benchmarking.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        127d 7h 9m 1 Robert Muir 11/Dec/12 15:25
        Resolved Resolved Closed Closed
        149d 19h 8m 1 Uwe Schindler 10/May/13 11:34
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428338

        LUCENE-4290: use SimpleAnalyzer so we test single-word sentences too

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428338 LUCENE-4290 : use SimpleAnalyzer so we test single-word sentences too
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428339

        LUCENE-4290: use SimpleAnalyzer so we test single-word sentences too

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428339 LUCENE-4290 : use SimpleAnalyzer so we test single-word sentences too
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428161

        LUCENE-4290: also detect attempts to highlight fields w/o any prox

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428161 LUCENE-4290 : also detect attempts to highlight fields w/o any prox
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428162

        LUCENE-4290: also detect attempts to highlight fields w/o any prox

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428162 LUCENE-4290 : also detect attempts to highlight fields w/o any prox
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428159

        LUCENE-4290: clean up some typos, add a description (from mikes blog), null checks, and other sand

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428159 LUCENE-4290 : clean up some typos, add a description (from mikes blog), null checks, and other sand
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428157

        LUCENE-4290: clean up some typos, add a description (from mikes blog), null checks, and other sand

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428157 LUCENE-4290 : clean up some typos, add a description (from mikes blog), null checks, and other sand
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428128

        LUCENE-4290: add another simple test

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428128 LUCENE-4290 : add another simple test
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1428127

        LUCENE-4290: add another simple test

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1428127 LUCENE-4290 : add another simple test
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1426075

        LUCENE-4290: add some more testing for this sandy highlighter

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1426075 LUCENE-4290 : add some more testing for this sandy highlighter
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1426072

        LUCENE-4290: add some more testing for this sandy highlighter

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1426072 LUCENE-4290 : add some more testing for this sandy highlighter
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1420279

        LUCENE-4290: be sure to throw exc if app didn't index offsets

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1420279 LUCENE-4290 : be sure to throw exc if app didn't index offsets
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1420275

        LUCENE-4290: be sure to throw exc if index didn't index offsets

        Show
        Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1420275 LUCENE-4290 : be sure to throw exc if index didn't index offsets
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1420221

        LUCENE-4290: basic highlighter that uses postings offsets

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1420221 LUCENE-4290 : basic highlighter that uses postings offsets
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 4.1 [ 12321140 ]
        Fix Version/s 5.0 [ 12321663 ]
        Resolution Fixed [ 1 ]
        Hide
        Robert Muir added a comment -

        I committed this for now to the sandbox. if it gets in our way we can just remove it.

        Show
        Robert Muir added a comment - I committed this for now to the sandbox. if it gets in our way we can just remove it.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1420217

        LUCENE-4290: basic highlighter that uses postings offsets

        Show
        Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1420217 LUCENE-4290 : basic highlighter that uses postings offsets
        Hide
        Robert Muir added a comment -

        I get some improvements here in performance (for non-prox queries) by hacking up luceneutil to
        test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors.

        However, I don't think this will be realistically useful until we have the new block layout from the pfor branch:
        prox queries are hurt by the interleaving in the stream (just like if you use payloads), unrelated to highlighting.

        I tried to do more experiments like 'wikibig' in luceneutil but i ran out of disk space.

        Once we have the block layout landed lets revisit this: it gives a much smaller index, faster indexing,
        and I think will work well when thats sorted out.

        Show
        Robert Muir added a comment - I get some improvements here in performance (for non-prox queries) by hacking up luceneutil to test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors. However, I don't think this will be realistically useful until we have the new block layout from the pfor branch: prox queries are hurt by the interleaving in the stream (just like if you use payloads), unrelated to highlighting. I tried to do more experiments like 'wikibig' in luceneutil but i ran out of disk space. Once we have the block layout landed lets revisit this: it gives a much smaller index, faster indexing, and I think will work well when thats sorted out.
        Hide
        Robert Muir added a comment -

        Should we move EMPTY into DocsAndPositionsEnum?

        maybe it can be either moved or removed if the code is fixed

        In this first patch its used both as a sentinel for a stopping condition and as
        a placeholder for "term doesnt exist in this segment". The former i think is
        no longer necessary and the latter is probably overkill.

        This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.

        Right: I think its different in a number of ways. I hope it should be really fast: but
        again I didnt even bother benchmarking yet.

        Its also limited in some ways since its just a prototype.

        Show
        Robert Muir added a comment - Should we move EMPTY into DocsAndPositionsEnum? maybe it can be either moved or removed if the code is fixed In this first patch its used both as a sentinel for a stopping condition and as a placeholder for "term doesnt exist in this segment". The former i think is no longer necessary and the latter is probably overkill. This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters. Right: I think its different in a number of ways. I hope it should be really fast: but again I didnt even bother benchmarking yet. Its also limited in some ways since its just a prototype.
        Hide
        Michael McCandless added a comment -

        Wow This looks very nice!

        Should we move EMPTY into DocsAndPositionsEnum?

        This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters.

        I like the EMPTY_INDEXREADER (so MTQs do no rewrite work).

        Show
        Michael McCandless added a comment - Wow This looks very nice! Should we move EMPTY into DocsAndPositionsEnum? This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters. I like the EMPTY_INDEXREADER (so MTQs do no rewrite work).
        Robert Muir made changes -
        Field Original Value New Value
        Attachment LUCENE-4290.patch [ 12539247 ]
        Robert Muir created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development