Solr
  1. Solr
  2. SOLR-386

Configurable SolrHighlighter implementation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: highlighter
    • Labels:
      None

      Description

      It would be great if SolrCore allowed the highlighter class to be configurable. A good way would be to add a class attribute to the <highlighting> element in solrconfig.xml, similar to how the RequestHandler instance is initialized in SorCore.

      1. SOLR-386-SolrHighlighter.patch
        56 kB
        Tricia Jenkins
      2. SOLR-386-SolrHighlighter.patch
        55 kB
        Tricia Jenkins
      3. SOLR-386-SolrHighlighter.patch
        39 kB
        Tricia Jenkins
      4. SOLR-386-SolrHighlighter.patch
        96 kB
        Tricia Jenkins
      5. SOLR-386-SolrHighlighter.patch
        96 kB
        Tricia Jenkins

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          How is this different from SOLR-225?

          The example solrconfig.xml in trunk includes:

          <highlighting>
          <!-- Configure the standard fragmenter -->
          <!-- This could most likely be commented out in the "default" case -->
          <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true">
          <lst name="defaults">
          <int name="hl.fragsize">100</int>
          </lst>
          </fragmenter>

          <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) -->
          <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
          <lst name="defaults">
          <!-- slightly smaller fragsizes work better because of slop -->
          <int name="hl.fragsize">70</int>
          <!-- allow 50% slop on fragment sizes -->
          <float name="hl.regex.slop">0.5</float>
          <!-- a basic sentence pattern -->
          <str name="hl.regex.pattern">[-\w ,/\n\"']

          {20,200}

          </str>
          </lst>
          </fragmenter>

          <!-- Configure the standard formatter -->
          <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
          <lst name="defaults">
          <str name="hl.simple.pre"><![CDATA[<em>]]></str>
          <str name="hl.simple.post"><![CDATA[</em>]]></str>
          </lst>
          </formatter>
          </highlighting>

          Show
          Ryan McKinley added a comment - How is this different from SOLR-225 ? The example solrconfig.xml in trunk includes: <highlighting> <!-- Configure the standard fragmenter --> <!-- This could most likely be commented out in the "default" case --> <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) --> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"> <lst name="defaults"> <!-- slightly smaller fragsizes work better because of slop --> <int name="hl.fragsize">70</int> <!-- allow 50% slop on fragment sizes --> <float name="hl.regex.slop">0.5</float> <!-- a basic sentence pattern --> <str name="hl.regex.pattern"> [-\w ,/\n\"'] {20,200} </str> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA [<em>] ]></str> <str name="hl.simple.post"><![CDATA [</em>] ]></str> </lst> </formatter> </highlighting>
          Hide
          Eli Levine added a comment -

          This is different in that this request asks for the SolrHighlighter implementation itself to be pluggable, not just fragmenters and formatters.

          Show
          Eli Levine added a comment - This is different in that this request asks for the SolrHighlighter implementation itself to be pluggable, not just fragmenters and formatters.
          Hide
          Mike Klaas added a comment -

          I suppose the real question is What behaviour are you trying to achieve by ripping out the whole highlighter implementation? If you have custom code that does something completely different, it is probably easier to just call that code manually from your own request handler.

          Show
          Mike Klaas added a comment - I suppose the real question is What behaviour are you trying to achieve by ripping out the whole highlighter implementation? If you have custom code that does something completely different, it is probably easier to just call that code manually from your own request handler.
          Hide
          Eli Levine added a comment -

          This would be useful for projects with non-standard highlighting needs. In our case, under certain circumstances we need to run a field through the highlighter multiple times using different fragmenters if previous runs did not produce any highlighted fragments. Also, this would allow us to use SpanScorer for phrase highlighting, as it is instantiated differently from other scorers (needs CachingTokenFilter, etc.)

          Since the Solr highlighter is a plug-in, it seems natural to allow different implementations to be used, similar to what's allowed for request and update handlers and other configurable parts of SolrCore.

          Thanks,

          Eli

          Show
          Eli Levine added a comment - This would be useful for projects with non-standard highlighting needs. In our case, under certain circumstances we need to run a field through the highlighter multiple times using different fragmenters if previous runs did not produce any highlighted fragments. Also, this would allow us to use SpanScorer for phrase highlighting, as it is instantiated differently from other scorers (needs CachingTokenFilter, etc.) Since the Solr highlighter is a plug-in, it seems natural to allow different implementations to be used, similar to what's allowed for request and update handlers and other configurable parts of SolrCore. Thanks, Eli
          Hide
          Tricia Jenkins added a comment -

          This patch allows highlighting to be plugged in.

          What I did:

          • Made SolrHighlighter an interface
          • The old SolrHighlighter became DefaultSolrHighlighter
          • Instantiate the highlighter in SolrCore based on what is in the solrconfig.xml

          So to roll your own

          • Implement SolrHighlighter (ie org.apache.solr.highlight.MySolrHighlighter)
          • find <highlighting> in solrconfig.xml and modify to <highlighting class="org.apache.solr.highlight.MySolrHighlighter">

          This patch builds on changes made to trunk by SOLR-281. This patch also contains these changes (meaning you should apply this patch to the trunk). I get the feeling that this is probably not the right way to build a dependent patch, but I don't know any better. Let me know if I should change how I built this patch.

          Show
          Tricia Jenkins added a comment - This patch allows highlighting to be plugged in. What I did: Made SolrHighlighter an interface The old SolrHighlighter became DefaultSolrHighlighter Instantiate the highlighter in SolrCore based on what is in the solrconfig.xml So to roll your own Implement SolrHighlighter (ie org.apache.solr.highlight.MySolrHighlighter) find <highlighting> in solrconfig.xml and modify to <highlighting class="org.apache.solr.highlight.MySolrHighlighter"> This patch builds on changes made to trunk by SOLR-281 . This patch also contains these changes (meaning you should apply this patch to the trunk). I get the feeling that this is probably not the right way to build a dependent patch, but I don't know any better. Let me know if I should change how I built this patch.
          Hide
          Tricia Jenkins added a comment -

          Updated patch to work with recent changes made to SolrCore. Should apply against a clean trunk again. No further changes.

          Show
          Tricia Jenkins added a comment - Updated patch to work with recent changes made to SolrCore. Should apply against a clean trunk again. No further changes.
          Hide
          Tricia Jenkins added a comment -

          SOLR-281 was recently commited. Formerly those changes were also included in this patch. The changes made by that patch have been removed from this patch so SOLR-386-SolrHighlighter.patch applies cleanly to trunk again. Unless someone has comments, or an alternate implementation I don't foresee this patch changing again.

          Show
          Tricia Jenkins added a comment - SOLR-281 was recently commited. Formerly those changes were also included in this patch. The changes made by that patch have been removed from this patch so SOLR-386 -SolrHighlighter.patch applies cleanly to trunk again. Unless someone has comments, or an alternate implementation I don't foresee this patch changing again.
          Hide
          Tricia Jenkins added a comment -

          I'd really like some feedback on this patch. I've just updated the patch based on changes that have been made to SolrHighlighter.java since revision 594314).

          Eli, does this meet your needs? This is all I need in SOLR-380 to plug in a custom highlighter. I would really appreciate if this could be committed by someone so that I can stop worrying about keeping up with revisions. It has been assigned to Mike Klass so his feedback in particular would be valuable to me.

          Thanks,
          Tricia

          Show
          Tricia Jenkins added a comment - I'd really like some feedback on this patch. I've just updated the patch based on changes that have been made to SolrHighlighter.java since revision 594314). Eli, does this meet your needs? This is all I need in SOLR-380 to plug in a custom highlighter. I would really appreciate if this could be committed by someone so that I can stop worrying about keeping up with revisions. It has been assigned to Mike Klass so his feedback in particular would be valuable to me. Thanks, Tricia
          Hide
          Eli Levine added a comment -

          I looked at the patch and it seems it does what I need, which is simply to allow the class of SolrHighlighter to be configurable. Thanks, Tricia.

          Eli

          Show
          Eli Levine added a comment - I looked at the patch and it seems it does what I need, which is simply to allow the class of SolrHighlighter to be configurable. Thanks, Tricia. Eli
          Hide
          Grant Ingersoll added a comment -

          Made SolrHighlighter an interface

          Why not just extend and override? The general problem w/ interfaces is it gets hard to maintain back-compatibility in this kind of environment. Another option would be to abstract SolrHighlighter to extend AbstractSolrHighlighter and then your new highlighter could just extend AbstractSolrHightlighter

          Show
          Grant Ingersoll added a comment - Made SolrHighlighter an interface Why not just extend and override? The general problem w/ interfaces is it gets hard to maintain back-compatibility in this kind of environment. Another option would be to abstract SolrHighlighter to extend AbstractSolrHighlighter and then your new highlighter could just extend AbstractSolrHightlighter
          Hide
          Mike Klaas added a comment -

          Hi Tricia,

          I'm not sure that I would ever use SolrHighlighter overriding (if I had the need, I probably would just make a separate search component). However, several people want this functionality and it has relatively low implementation/maintenance cost.

          There are a few things that need to be done to get the patch committed. First, the unnecessary whitespace changes in SolrCore shouldn't be in the diff (it makes it really hard to see the changes, and might make it hard to apply/revert). Also, I'm skeptical of using interfaces for this type of thing, for maintenance reasons. Perhaps we could move to one of the approaches that Grant suggested?

          Thanks again for the contribution, and sorry it took so long

          Show
          Mike Klaas added a comment - Hi Tricia, I'm not sure that I would ever use SolrHighlighter overriding (if I had the need, I probably would just make a separate search component). However, several people want this functionality and it has relatively low implementation/maintenance cost. There are a few things that need to be done to get the patch committed. First, the unnecessary whitespace changes in SolrCore shouldn't be in the diff (it makes it really hard to see the changes, and might make it hard to apply/revert). Also, I'm skeptical of using interfaces for this type of thing, for maintenance reasons. Perhaps we could move to one of the approaches that Grant suggested? Thanks again for the contribution, and sorry it took so long
          Hide
          Tricia Jenkins added a comment -

          OK. So I think I fixed the whitespace problem.

          Thanks for explaining the problem with interfaces – that makes sense now. I assume that EventListener and RequestHandler are interfaces because they've been thought long and hard about and are not going to change?

          My first try at the patch was just to include the public methods, which are the ones you (MIke Klaas) list:
          > .initialize(Config)
          > .isHighlightEnabled(SolrParams)
          > .doHighlighting(...)
          > .getHighlightFields(...)

          I discovered that the unit tests call the formatters and fragmenters directly so in the interface version I had made public get methods for these. Now that we're using an abstract class I am able to just include these as is - so no changes to HighlighterTest this time. But speaking of unit tests... Anecdotally I know that the SolrCore changes allow the highlighter to be configured (my custom highlighter). I wrote HighlighterConfigTest as a unit test for this functionality.

          I decided to leave the default implementation of isHighlightingEnabled(SolrParams), and getHighlightFields(...) in the abstract class because both methods deal with reading parameters. I can't think of a use case of a highlighter that wouldn't use this or at worst ignore/override this method. Is this a reasonable decision?

          I wasn't sure what to do with the logger, so I left it in the AbstractSolrHighlighter. This decision is based on the example of UpdateHandler.

          Hmm... I just realized that naming the abstraction of SolrHighlighter AbstractSolrHighlighter causes problems all over the code when you want your custom highlighter to plugin. I think the path of least resistance is to call the abstract class SolrHighlighter and the existing implementation DefaultSolrHighlighter.

          Thoughts?

          Show
          Tricia Jenkins added a comment - OK. So I think I fixed the whitespace problem. Thanks for explaining the problem with interfaces – that makes sense now. I assume that EventListener and RequestHandler are interfaces because they've been thought long and hard about and are not going to change? My first try at the patch was just to include the public methods, which are the ones you (MIke Klaas) list: > .initialize(Config) > .isHighlightEnabled(SolrParams) > .doHighlighting(...) > .getHighlightFields(...) I discovered that the unit tests call the formatters and fragmenters directly so in the interface version I had made public get methods for these. Now that we're using an abstract class I am able to just include these as is - so no changes to HighlighterTest this time. But speaking of unit tests... Anecdotally I know that the SolrCore changes allow the highlighter to be configured (my custom highlighter). I wrote HighlighterConfigTest as a unit test for this functionality. I decided to leave the default implementation of isHighlightingEnabled(SolrParams), and getHighlightFields(...) in the abstract class because both methods deal with reading parameters. I can't think of a use case of a highlighter that wouldn't use this or at worst ignore/override this method. Is this a reasonable decision? I wasn't sure what to do with the logger, so I left it in the AbstractSolrHighlighter. This decision is based on the example of UpdateHandler. Hmm... I just realized that naming the abstraction of SolrHighlighter AbstractSolrHighlighter causes problems all over the code when you want your custom highlighter to plugin. I think the path of least resistance is to call the abstract class SolrHighlighter and the existing implementation DefaultSolrHighlighter. Thoughts?
          Hide
          Mike Klaas added a comment -

          Tricia: thanks for the changes. I think that your decisions make sense--I'll be sure to followup by the end of the week.

          Show
          Mike Klaas added a comment - Tricia: thanks for the changes. I think that your decisions make sense--I'll be sure to followup by the end of the week.
          Hide
          Mike Klaas added a comment -

          Commited in r639490. Thanks Tricia!

          Show
          Mike Klaas added a comment - Commited in r639490. Thanks Tricia!
          Hide
          Hoss Man added a comment -

          This bug was modified as part of a bulk update using the criteria...

          • Currently marked ("Resolved" or "Closed") and "Fixed"
          • Had no "Fix Version" versions
          • "Affects Versions" included 1.3

          The Fix Version for all 8 issues found was set to 1.3 (1.3 has not yet been released, if an issue is already fixed, and it affected 1.3 then the fix will be in 1.3)

          Email notification was suppressed to prevent excessive email.

          For a list of all the issues modified, search jira comments for this
          (hopefully) unique string: 20080415hossman1

          Show
          Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Currently marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions "Affects Versions" included 1.3 The Fix Version for all 8 issues found was set to 1.3 (1.3 has not yet been released, if an issue is already fixed, and it affected 1.3 then the fix will be in 1.3) Email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman1

            People

            • Assignee:
              Mike Klaas
              Reporter:
              Eli Levine
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development