Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3087

highlighting exact phrase with overlapping tokens fails.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9.4, 3.1
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Fields with overlapping token are not highlighted in search results when searching exact phrases, when using TermVector.WITH_OFFSET.

      The document builded in MemoryIndex for highlight does not preserve positions of tokens in this case. Overlapping tokens get "flattened" (position increment always set to 1), the spanquery used for searching relevant fragment will fail to identify the correct token sequence because the position shift.

      I corrected this by adding a position increment calculation in sub class StoredTokenStream. I added junit test covering this case.

      I used the eclipse codestyle from trunk, but style add quite a few format differences between repository and working copy files. I tried to reduce them, but some linewrapping rules still doesn't match.

      Correction patch joined

        Attachments

        1. LUCENE-3087.patch
          10 kB
          Pierre Gossé

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              pigo Pierre Gossé
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: