Solr
  1. Solr
  2. SOLR-1268

Incorporate Lucene's FastVectorHighlighter

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: highlighter
    • Labels:
      None

      Description

      Correcting Fix Version based on CHANGES.txt, see this thread for more details...

      http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

      1. SOLR-1268.patch
        35 kB
        Koji Sekiguchi
      2. SOLR-1268.patch
        48 kB
        Koji Sekiguchi
      3. SOLR-1268-0_fragsize.patch
        1 kB
        Koji Sekiguchi
      4. SOLR-1268-0_fragsize.patch
        2 kB
        Koji Sekiguchi
      5. SOLR-1268.patch
        4 kB
        Koji Sekiguchi

        Issue Links

          Activity

          Hide
          Gabriel Farrell added a comment - - edited

          Voting for this issue because of the following email from Bess Sadler, which sums up the need quite well:

          One of the feature requests we get pretty often with Blacklight is
          search term highlighting. The main reason we don't have it yet is
          because it's a performance drag. We have attempted to add it a couple
          of times, but it kills performance so much for large collections or
          large text fields that we had to remove it again.

          I just had an interesting chat with Erik Hatcher, and he pointed me at this:
          http://www.lucidimagination.com/search/document/a4deefd915f706d4/highlighting_performance

          It seems Lucene 2.9 has a faster highlighting solution available now,
          FastVectorHighlighter. However, it hasn't yet worked its way into
          solr. If you are one of the people who would like to see search term
          highlighting (or, maybe just faster search term highlighting) in
          blacklight, vufind, Fac-Back-OPAC, helios, or any of the many other
          library apps that use solr, you might want to go vote for the jira
          issue at:

          https://issues.apache.org/jira/browse/SOLR-1268

          Show
          Gabriel Farrell added a comment - - edited Voting for this issue because of the following email from Bess Sadler, which sums up the need quite well: One of the feature requests we get pretty often with Blacklight is search term highlighting. The main reason we don't have it yet is because it's a performance drag. We have attempted to add it a couple of times, but it kills performance so much for large collections or large text fields that we had to remove it again. I just had an interesting chat with Erik Hatcher, and he pointed me at this: http://www.lucidimagination.com/search/document/a4deefd915f706d4/highlighting_performance It seems Lucene 2.9 has a faster highlighting solution available now, FastVectorHighlighter. However, it hasn't yet worked its way into solr. If you are one of the people who would like to see search term highlighting (or, maybe just faster search term highlighting) in blacklight, vufind, Fac-Back-OPAC, helios, or any of the many other library apps that use solr, you might want to go vote for the jira issue at: https://issues.apache.org/jira/browse/SOLR-1268
          Hide
          Nicolas Dessaigne added a comment -

          In the meantime, you may be interested in the maxChars attribute of copy fields (https://issues.apache.org/jira/browse/SOLR-538). It permits fast highlighting if only the beginning of documents.

          Show
          Nicolas Dessaigne added a comment - In the meantime, you may be interested in the maxChars attribute of copy fields ( https://issues.apache.org/jira/browse/SOLR-538 ). It permits fast highlighting if only the beginning of documents.
          Hide
          Shalin Shekhar Mangar added a comment -

          Is there a reason why this is not marked for 1.4?

          Show
          Shalin Shekhar Mangar added a comment - Is there a reason why this is not marked for 1.4?
          Hide
          Koji Sekiguchi added a comment -

          Mark it to 1.5 because there is no patches.

          Show
          Koji Sekiguchi added a comment - Mark it to 1.5 because there is no patches.
          Hide
          Koji Sekiguchi added a comment -

          First draft, untested patch attached.

          Show
          Koji Sekiguchi added a comment - First draft, untested patch attached.
          Hide
          Koji Sekiguchi added a comment -

          Added a few SolrFragmentsBuilders and test cases.

          Show
          Koji Sekiguchi added a comment - Added a few SolrFragmentsBuilders and test cases.
          Hide
          Koji Sekiguchi added a comment -

          I'm introducing <fragListBuilder/> and <fragmentsBuilder/> new sub tags of <highlighting/> in solrconfig.xml in this patch, rather than <searchComponent/>. I think we can open a separate ticket for moving <highlighting/> settings to <searchComponent/>, if needed.

          FYI:
          http://old.nabble.com/highlighting-setting-in-solrconfig.xml-td26984003.html

          Show
          Koji Sekiguchi added a comment - I'm introducing <fragListBuilder/> and <fragmentsBuilder/> new sub tags of <highlighting/> in solrconfig.xml in this patch, rather than <searchComponent/>. I think we can open a separate ticket for moving <highlighting/> settings to <searchComponent/>, if needed. FYI: http://old.nabble.com/highlighting-setting-in-solrconfig.xml-td26984003.html
          Hide
          Koji Sekiguchi added a comment -

          I'll commit in a few days if nobody objects.

          Show
          Koji Sekiguchi added a comment - I'll commit in a few days if nobody objects.
          Hide
          Koji Sekiguchi added a comment -

          Committed revision 897383.

          Show
          Koji Sekiguchi added a comment - Committed revision 897383.
          Hide
          Marc Sturlese added a comment -

          I have noticed an exception is thrown when using fragSize = 0 (wich should return the whole field highlighted):
          "fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher"

          Show
          Marc Sturlese added a comment - I have noticed an exception is thrown when using fragSize = 0 (wich should return the whole field highlighted): "fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher"
          Hide
          Koji Sekiguchi added a comment -

          I have noticed an exception is thrown when using fragSize = 0 (wich should return the whole field highlighted):
          "fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher"

          Thanks, Marc.
          Solr 1.4 uses NullFragmenter that highlights whole content when you set fragsize to 0. But FVH doesn't have such feature because of using different algorithm.
          In the attached patch, Solr sets fragsize to Integer.MAX_VALUE if user trys to set 0 when FVH is used. This prevents runtime error.
          I think it is necessary in Solr level because Solr automatically switch to use FVH when the highlighting field is termVectors/termPositions/termOffsets are all true unless hl.useHighlighter set to true.

          Show
          Koji Sekiguchi added a comment - I have noticed an exception is thrown when using fragSize = 0 (wich should return the whole field highlighted): "fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher" Thanks, Marc. Solr 1.4 uses NullFragmenter that highlights whole content when you set fragsize to 0. But FVH doesn't have such feature because of using different algorithm. In the attached patch, Solr sets fragsize to Integer.MAX_VALUE if user trys to set 0 when FVH is used. This prevents runtime error. I think it is necessary in Solr level because Solr automatically switch to use FVH when the highlighting field is termVectors/termPositions/termOffsets are all true unless hl.useHighlighter set to true.
          Hide
          Koji Sekiguchi added a comment -

          Hmm, FVH doesn't work appropriately when fragsize=Integer.MAX_SIZE (see test0FragSize() in attached patch. It indicates FVH cannot produce whole snippet when fragsize=Integer.MAX_SIZE).

          Now I think I should change the (traditional) Highlighter is default even if the highlighting field's termVectors/termPositions/termOffsets are all true, then only when hl.useFastVectorHighlighter is set to true, FVH will be used. hl.useFastVectorHighlighter parameter accepts per-field overrides. Plus FVH doesn't support 0 fragsize.

          Show
          Koji Sekiguchi added a comment - Hmm, FVH doesn't work appropriately when fragsize=Integer.MAX_SIZE (see test0FragSize() in attached patch. It indicates FVH cannot produce whole snippet when fragsize=Integer.MAX_SIZE). Now I think I should change the (traditional) Highlighter is default even if the highlighting field's termVectors/termPositions/termOffsets are all true, then only when hl.useFastVectorHighlighter is set to true, FVH will be used. hl.useFastVectorHighlighter parameter accepts per-field overrides. Plus FVH doesn't support 0 fragsize.
          Hide
          Koji Sekiguchi added a comment -

          The patch includes:

          1. eliminate hl.useHighlighter parameter
          2. introduce hl.useFastVectorHighlighter parameter. The default is false

          Therefore, Highlighter will be used unless hl.useFastVectorHighlighter set to true. I'll commit in a few days.

          Show
          Koji Sekiguchi added a comment - The patch includes: eliminate hl.useHighlighter parameter introduce hl.useFastVectorHighlighter parameter. The default is false Therefore, Highlighter will be used unless hl.useFastVectorHighlighter set to true. I'll commit in a few days.
          Hide
          Kent William added a comment -

          When using Dismax, the fast vector highlighter fails to return any highlighting when there is more than one column in qf (eg. "qf=Name Company")...

          Show
          Kent William added a comment - When using Dismax, the fast vector highlighter fails to return any highlighting when there is more than one column in qf (eg. "qf=Name Company")...
          Hide
          Koji Sekiguchi added a comment -

          When using Dismax, the fast vector highlighter fails to return any highlighting when there is more than one column in qf (eg. "qf=Name Company")...

          Right. See https://issues.apache.org/jira/browse/LUCENE-2243 .

          Show
          Koji Sekiguchi added a comment - When using Dismax, the fast vector highlighter fails to return any highlighting when there is more than one column in qf (eg. "qf=Name Company")... Right. See https://issues.apache.org/jira/browse/LUCENE-2243 .
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Hide
          Antony Stubbs added a comment -

          Koji, with mutli-term fields, Highlighter would return the single value that matched. FVH however merges values in the fragment returned. Is there a way to get the same behavior as highlighter in this respect (in my use case, i only want the value that matched to be highlighted)?

          Show
          Antony Stubbs added a comment - Koji, with mutli-term fields, Highlighter would return the single value that matched. FVH however merges values in the fragment returned. Is there a way to get the same behavior as highlighter in this respect (in my use case, i only want the value that matched to be highlighted)?

            People

            • Assignee:
              Koji Sekiguchi
              Reporter:
              Koji Sekiguchi
            • Votes:
              15 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development