Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2874

Highlighting overlapping tokens outputs doubled words

Details

    • New, Patch Available

    Description

      If for the text "the fox did not jump" we generate following tokens :
      (the, 0, 0-3),(

      {fox}

      ,0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)

      If TermVector for field is stored WITH_OFFSETS and not WITH_POSITIONS_OFFSETS, highlighing would output
      "the<em>the fox</em> did not jump"

      I join a patch with 2 additive JUnit tests and a fix of TokenSources class where token ordering by offset did'nt manage well overlapping tokens.

      Attachments

        1. LUCENE-2874.patch
          8 kB
          Pierre Gossé

        Activity

          People

            rcmuir Robert Muir
            pigo Pierre Gossé
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: