Lucene - Core
  1. Lucene - Core
  2. LUCENE-2874

Highlighting overlapping tokens outputs doubled words

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      If for the text "the fox did not jump" we generate following tokens :
      (the, 0, 0-3),(

      {fox}

      ,0,0-7),(fox,1,4-7),(did,2,8-11),(not,3,12,15),(jump,4,16,18)

      If TermVector for field is stored WITH_OFFSETS and not WITH_POSITIONS_OFFSETS, highlighing would output
      "the<em>the fox</em> did not jump"

      I join a patch with 2 additive JUnit tests and a fix of TokenSources class where token ordering by offset did'nt manage well overlapping tokens.

      1. LUCENE-2874.patch
        8 kB
        Pierre Gossé

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Hide
        Robert Muir added a comment -

        Committed revisions:

        trunk: 1060779
        3.x: 1060782
        3.0.x: 1060786
        2.9.x: 1060791

        Thanks Pierre, especially for writing up the unit tests.

        Show
        Robert Muir added a comment - Committed revisions: trunk: 1060779 3.x: 1060782 3.0.x: 1060786 2.9.x: 1060791 Thanks Pierre, especially for writing up the unit tests.
        Hide
        Robert Muir added a comment -

        Hi Pierre, thanks for the patch!

        This sure looks like a bug to me (startoffset - endoffset does not make sense at all).

        Show
        Robert Muir added a comment - Hi Pierre, thanks for the patch! This sure looks like a bug to me (startoffset - endoffset does not make sense at all).
        Hide
        Pierre Gossé added a comment -

        Fixed the coding format issue.

        Show
        Pierre Gossé added a comment - Fixed the coding format issue.
        Hide
        Pierre Gossé added a comment -

        I couldn't get coding convention for eclipse from the wiki, link seams leads to an error
        "You are not allowed to do AttachFile on this page. Login and try again."

        Sorry for the many differences in diff, the changed part is on lines 251 and 152 of new file

        Show
        Pierre Gossé added a comment - I couldn't get coding convention for eclipse from the wiki, link seams leads to an error "You are not allowed to do AttachFile on this page. Login and try again." Sorry for the many differences in diff, the changed part is on lines 251 and 152 of new file

          People

          • Assignee:
            Robert Muir
            Reporter:
            Pierre Gossé
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development