Lucene - Core
  1. Lucene - Core
  2. LUCENE-2874

Highlighting overlapping tokens outputs doubled words

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      If for the text "the fox did not jump" we generate following tokens :
      (the, 0, 0-3),(

      {fox}

      ,0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)

      If TermVector for field is stored WITH_OFFSETS and not WITH_POSITIONS_OFFSETS, highlighing would output
      "the<em>the fox</em> did not jump"

      I join a patch with 2 additive JUnit tests and a fix of TokenSources class where token ordering by offset did'nt manage well overlapping tokens.

      1. LUCENE-2874.patch
        8 kB
        Pierre Gossé

        Activity

        Pierre Gossé created issue -
        Pierre Gossé made changes -
        Field Original Value New Value
        Attachment LUCENE-2874.patch [ 12468744 ]
        Pierre Gossé made changes -
        Affects Version/s 2.9.4 [ 12315148 ]
        Component/s contrib/highlighter [ 12312096 ]
        Pierre Gossé made changes -
        Attachment LUCENE-2874.patch [ 12468744 ]
        Pierre Gossé made changes -
        Attachment LUCENE-2874.patch [ 12468746 ]
        Robert Muir made changes -
        Assignee Robert Muir [ rcmuir ]
        Fix Version/s 2.9.5 [ 12315914 ]
        Fix Version/s 3.0.4 [ 12315913 ]
        Fix Version/s 3.1 [ 12314822 ]
        Fix Version/s 4.0 [ 12314025 ]
        Lucene Fields [New] [New, Patch Available]
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Mark Thomas made changes -
        Workflow jira [ 12542900 ] Default workflow, editable Closed status [ 12563931 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563931 ] jira [ 12584572 ]
        Grant Ingersoll made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Robert Muir
            Reporter:
            Pierre Gossé
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development