Solr
  1. Solr
  2. SOLR-575

Highlighting spans should merge across phrase query

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.2
    • Fix Version/s: 4.9, 5.0
    • Component/s: highlighter
    • Labels:
      None

      Description

      Somewhat related to but separate from SOLR-553,

      It would be nice if the highlighter component "joined" the formatter tags across an entire PhraseQuery.

      e.g.

      Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :

      should really be

      Lights (Live) : <span>I Love You But I've Chosen Darkness</span> :

      assuming the query that generated these fragments was "I Love You But I've Chosen Darkness"

      I assume there's issues with stopwords (the But in the name was not formatted)

        Activity

        Brian Whitman created issue -
        Hide
        Mark Miller added a comment -

        With the current API, I just don't see this happening. Tokens are given one at a time to be 'lit and returned - these formatted pieces are used to build up the fragments. Even trying to play tricks, I just don't think this issue is cleanly doable.

        With an alternate approach (one that didn't hand off individual tokens for highlighting) its easy enough, but I don't see the approach changing soon.

        It would almost be less of a hassle, if for some reason you really needed this, to just post process and merge continuous spans with regex or something. You still have the issue of stopwords that are not 'lit, etc but they are a lot easier to overcome than the API limitations of the Highlighter framework.

        Show
        Mark Miller added a comment - With the current API, I just don't see this happening. Tokens are given one at a time to be 'lit and returned - these formatted pieces are used to build up the fragments. Even trying to play tricks, I just don't think this issue is cleanly doable. With an alternate approach (one that didn't hand off individual tokens for highlighting) its easy enough, but I don't see the approach changing soon. It would almost be less of a hassle, if for some reason you really needed this, to just post process and merge continuous spans with regex or something. You still have the issue of stopwords that are not 'lit, etc but they are a lot easier to overcome than the API limitations of the Highlighter framework.
        Hide
        Otis Gospodnetic added a comment -

        Brian, perhaps something as simple as the java equivalent of s/"</span> <span>"//g might work?

        Show
        Otis Gospodnetic added a comment - Brian, perhaps something as simple as the java equivalent of s/"</span> <span>"//g might work?
        Otis Gospodnetic made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Brian Whitman added a comment -

        Sure, post-processing is somewhat easy, except for stopwords (note the But in the example) – it's just one of those quality-of-life concerns

        Show
        Brian Whitman added a comment - Sure, post-processing is somewhat easy, except for stopwords (note the But in the example) – it's just one of those quality-of-life concerns
        Shalin Shekhar Mangar made changes -
        Fix Version/s 1.4 [ 12313351 ]
        Hide
        Shalin Shekhar Mangar added a comment -

        Marked for 1.5

        Show
        Shalin Shekhar Mangar added a comment - Marked for 1.5
        Shalin Shekhar Mangar made changes -
        Fix Version/s 1.4 [ 12313351 ]
        Fix Version/s 1.5 [ 12313566 ]
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hoss Man made changes -
        Fix Version/s Next [ 12315093 ]
        Fix Version/s 1.5 [ 12313566 ]
        Hoss Man made changes -
        Fix Version/s 3.2 [ 12316172 ]
        Fix Version/s Next [ 12315093 ]
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Robert Muir made changes -
        Fix Version/s 3.3 [ 12316471 ]
        Fix Version/s 3.2 [ 12316172 ]
        Robert Muir made changes -
        Fix Version/s 3.4 [ 12316683 ]
        Fix Version/s 4.0 [ 12314992 ]
        Fix Version/s 3.3 [ 12316471 ]
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Robert Muir made changes -
        Fix Version/s 3.5 [ 12317876 ]
        Fix Version/s 3.4 [ 12316683 ]
        Simon Willnauer made changes -
        Fix Version/s 3.6 [ 12319065 ]
        Fix Version/s 3.5 [ 12317876 ]
        Hide
        Hoss Man added a comment -

        Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

        email notification suppressed to prevent mass-spam
        psuedo-unique token identifying these issues: hoss20120321nofix36

        Show
        Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
        Hoss Man made changes -
        Fix Version/s 3.6 [ 12319065 ]
        Robert Muir made changes -
        Fix Version/s 4.1 [ 12321141 ]
        Fix Version/s 4.0 [ 12314992 ]
        Mark Miller made changes -
        Fix Version/s 4.2 [ 12323893 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.1 [ 12321141 ]
        Robert Muir made changes -
        Fix Version/s 4.3 [ 12324128 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.2 [ 12323893 ]
        Uwe Schindler made changes -
        Fix Version/s 4.4 [ 12324324 ]
        Fix Version/s 4.3 [ 12324128 ]
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Steve Rowe made changes -
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.5 [ 12324743 ]
        Fix Version/s 4.4 [ 12324324 ]
        Adrien Grand made changes -
        Fix Version/s 4.6 [ 12325000 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.5 [ 12324743 ]
        Uwe Schindler made changes -
        Fix Version/s 4.7 [ 12325573 ]
        Fix Version/s 4.6 [ 12325000 ]
        David Smiley made changes -
        Fix Version/s 4.8 [ 12326254 ]
        Fix Version/s 4.7 [ 12325573 ]
        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        Uwe Schindler made changes -
        Fix Version/s 4.9 [ 12326731 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.8 [ 12326254 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Brian Whitman
          • Votes:
            6 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development