Lucene - Core
  1. Lucene - Core
  2. LUCENE-2874

Highlighting overlapping tokens outputs doubled words

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      If for the text "the fox did not jump" we generate following tokens :
      (the, 0, 0-3),(

      {fox}

      ,0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)

      If TermVector for field is stored WITH_OFFSETS and not WITH_POSITIONS_OFFSETS, highlighing would output
      "the<em>the fox</em> did not jump"

      I join a patch with 2 additive JUnit tests and a fix of TokenSources class where token ordering by offset did'nt manage well overlapping tokens.

      1. LUCENE-2874.patch
        8 kB
        Pierre Gossé

        Activity

        Pierre Gossé created issue -
        Hide
        Pierre Gossé added a comment -

        I couldn't get coding convention for eclipse from the wiki, link seams leads to an error
        "You are not allowed to do AttachFile on this page. Login and try again."

        Sorry for the many differences in diff, the changed part is on lines 251 and 152 of new file

        Show
        Pierre Gossé added a comment - I couldn't get coding convention for eclipse from the wiki, link seams leads to an error "You are not allowed to do AttachFile on this page. Login and try again." Sorry for the many differences in diff, the changed part is on lines 251 and 152 of new file
        Pierre Gossé made changes -
        Field Original Value New Value
        Attachment LUCENE-2874.patch [ 12468744 ]
        Pierre Gossé made changes -
        Affects Version/s 2.9.4 [ 12315148 ]
        Component/s contrib/highlighter [ 12312096 ]
        Pierre Gossé made changes -
        Attachment LUCENE-2874.patch [ 12468744 ]
        Hide
        Pierre Gossé added a comment -

        Fixed the coding format issue.

        Show
        Pierre Gossé added a comment - Fixed the coding format issue.
        Pierre Gossé made changes -
        Attachment LUCENE-2874.patch [ 12468746 ]
        Hide
        Robert Muir added a comment -

        Hi Pierre, thanks for the patch!

        This sure looks like a bug to me (startoffset - endoffset does not make sense at all).

        Show
        Robert Muir added a comment - Hi Pierre, thanks for the patch! This sure looks like a bug to me (startoffset - endoffset does not make sense at all).
        Robert Muir made changes -
        Assignee Robert Muir [ rcmuir ]
        Fix Version/s 2.9.5 [ 12315914 ]
        Fix Version/s 3.0.4 [ 12315913 ]
        Fix Version/s 3.1 [ 12314822 ]
        Fix Version/s 4.0 [ 12314025 ]
        Lucene Fields [New] [New, Patch Available]
        Hide
        Robert Muir added a comment -

        Committed revisions:

        trunk: 1060779
        3.x: 1060782
        3.0.x: 1060786
        2.9.x: 1060791

        Thanks Pierre, especially for writing up the unit tests.

        Show
        Robert Muir added a comment - Committed revisions: trunk: 1060779 3.x: 1060782 3.0.x: 1060786 2.9.x: 1060791 Thanks Pierre, especially for writing up the unit tests.
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Mark Thomas made changes -
        Workflow jira [ 12542900 ] Default workflow, editable Closed status [ 12563931 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563931 ] jira [ 12584572 ]
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Grant Ingersoll made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1h 56m 1 Robert Muir 19/Jan/11 12:37
        Resolved Resolved Closed Closed
        70d 3h 12m 1 Grant Ingersoll 30/Mar/11 16:50

          People

          • Assignee:
            Robert Muir
            Reporter:
            Pierre Gossé
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development