Solr
  1. Solr
  2. SOLR-225

Allow pluggable Highlighting classes -- Formatters and Fragmenters

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      Highlighting should support a pluggable architecture similar to what is seen with RequestHandlers, Fields, FieldTypes, etc
      '
      For more background:
      http://www.nabble.com/Custom-fragmenter-tf3681588.html#a10289335

      1. SOLR-225-HighlightingConfig.patch
        98 kB
        Ryan McKinley
      2. SOLR-225-HighlightingConfig.patch
        97 kB
        Ryan McKinley
      3. SOLR-225-HighlightingConfig.patch
        97 kB
        Ryan McKinley
      4. SOLR-225-HighlightingConfig.patch
        97 kB
        Ryan McKinley
      5. SOLR-225-HighlightingConfig.patch
        100 kB
        Ryan McKinley
      6. SOLR-225+260-HighlightPlugins.patch
        89 kB
        Ryan McKinley
      7. SOLR-225+260-HighlightPlugins.patch
        84 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          This is a major rework of the highlighting architecture. It is now configured similar to other classes.

          Solrconfig.xml can define:

          <highlighting>
          <!-- Configure the standard fragmenter -->
          <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true">
          <lst name="defaults">
          <int name="hl.fragsize">100</int>
          </lst>
          </fragmenter>

          <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
          <lst name="defaults">
          <int name="hl.fragsize">70</int>
          </lst>
          </fragmenter>

          <!-- Configure the standard formatter -->
          <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
          <lst name="defaults">
          <str name="hl.simple.pre"><![CDATA[<em>]]></str>
          <str name="hl.simple.post"><![CDATA[</em>]]></str>
          </lst>
          </formatter>
          </highlighting>

          This incorporates SOLR-102 as an optional fragmenter.

          This is still quite rough...

          thoughts?

          Show
          Ryan McKinley added a comment - This is a major rework of the highlighting architecture. It is now configured similar to other classes. Solrconfig.xml can define: <highlighting> <!-- Configure the standard fragmenter --> <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"> <lst name="defaults"> <int name="hl.fragsize">70</int> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA [<em>] ]></str> <str name="hl.simple.post"><![CDATA [</em>] ]></str> </lst> </formatter> </highlighting> This incorporates SOLR-102 as an optional fragmenter. This is still quite rough... thoughts?
          Hide
          Mike Klaas added a comment -

          Wow, that is some honking configurability! I've looked at the patch, but haven't tested it.

          The plugin architecture seems like something tha could be made more general than just plugins. I imagine that with very little effort, this could be used to entirely replace the *Factory.java classes that are used for Analyzer/TokenStream configuration (of course, that would relegate our nice configuration parameter names to a<lst name="defaults"> approach... perhaps the plugin archtecture could convert all unknown xml node parameters to SolrParams too?). This is a weighty-enough topic that it should perhaps be broken out into a separate issue.

          The highlighting configurability itself is fine with me. I think that we'll be using the lucene highlighter (or reasonable facsimile) for long enough to make it worth additional configurability.

          Show
          Mike Klaas added a comment - Wow, that is some honking configurability! I've looked at the patch, but haven't tested it. The plugin architecture seems like something tha could be made more general than just plugins. I imagine that with very little effort, this could be used to entirely replace the *Factory.java classes that are used for Analyzer/TokenStream configuration (of course, that would relegate our nice configuration parameter names to a<lst name="defaults"> approach... perhaps the plugin archtecture could convert all unknown xml node parameters to SolrParams too?). This is a weighty-enough topic that it should perhaps be broken out into a separate issue. The highlighting configurability itself is fine with me. I think that we'll be using the lucene highlighter (or reasonable facsimile) for long enough to make it worth additional configurability.
          Hide
          Ryan McKinley added a comment -

          no real changes... it applies cleanly with trunk.

          > The plugin architecture seems like something tha could be made more general than just plugins.

          It is a 95% duplicate of the RequestHandler plugin architecture. The only reason it could not be identical was the lazy loading request handlers...

          Currently solr has two plugin initialization types:
          1. init( NamedList args )
          2. init( Map<String,String> args)

          If we added an interface for each initalization type, we could probably do all plugin initalization with something like the PluginLoader class in this patch

          class PluginLoader<T>
          {
          public Map<String,T> load( NodeList nodes )

          { ... }

          }

          Show
          Ryan McKinley added a comment - no real changes... it applies cleanly with trunk. > The plugin architecture seems like something tha could be made more general than just plugins. It is a 95% duplicate of the RequestHandler plugin architecture. The only reason it could not be identical was the lazy loading request handlers... Currently solr has two plugin initialization types: 1. init( NamedList args ) 2. init( Map<String,String> args) If we added an interface for each initalization type, we could probably do all plugin initalization with something like the PluginLoader class in this patch class PluginLoader<T> { public Map<String,T> load( NodeList nodes ) { ... } }
          Hide
          Oz Solomon added a comment -

          Ryan - this is a great feature. Any chance to see it applied to the nightly builds?

          Show
          Oz Solomon added a comment - Ryan - this is a great feature. Any chance to see it applied to the nightly builds?
          Hide
          Ryan McKinley added a comment -

          After solr 1.2 is out, I hope this will get in.

          This patch still needs a serious code review before considering committing it.

          Show
          Ryan McKinley added a comment - After solr 1.2 is out, I hope this will get in. This patch still needs a serious code review before considering committing it.
          Hide
          Mike Klaas added a comment -

          Yeah. I'm happy to review the highlighting stuff (there are some parameter changes and such that would be worthwhile), but it woudl be nice to hear from yonik or hoss on the classloading/configuration stuff, and any overlap with similar-but-different existing systems in Solr. I'm afraid I haven't had time to follow all the developments on that front.

          I agree that this is a major change and there is no need to rush it for 1.2

          Show
          Mike Klaas added a comment - Yeah. I'm happy to review the highlighting stuff (there are some parameter changes and such that would be worthwhile), but it woudl be nice to hear from yonik or hoss on the classloading/configuration stuff, and any overlap with similar-but-different existing systems in Solr. I'm afraid I haven't had time to follow all the developments on that front. I agree that this is a major change and there is no need to rush it for 1.2
          Hide
          Ryan McKinley added a comment -

          applies with trunk

          Show
          Ryan McKinley added a comment - applies with trunk
          Hide
          Ryan McKinley added a comment -

          implemented with the (soon to upload) SOLR-260 PluginLoader<T>. I'll wait till SOLR-260 is solid before updating again...

          Show
          Ryan McKinley added a comment - implemented with the (soon to upload) SOLR-260 PluginLoader<T>. I'll wait till SOLR-260 is solid before updating again...
          Hide
          Ryan McKinley added a comment -

          This uses SOLR-260 general plugin loader to load highlighting plugins.

          Since these patches touch a few of the same files, I will maintain them together.

          Show
          Ryan McKinley added a comment - This uses SOLR-260 general plugin loader to load highlighting plugins. Since these patches touch a few of the same files, I will maintain them together.
          Hide
          Ryan McKinley added a comment -

          no real changes. added some more javadocs.

          I think this should be added to trunk soon...

          Show
          Ryan McKinley added a comment - no real changes. added some more javadocs. I think this should be added to trunk soon...
          Hide
          Mike Klaas added a comment - - edited

          Looking great Ryan (again, only commenting on the Highlighting configurability parts)

          should:

          protected boolean emptyArray(String[] arr)

          { return (arr == null || arr.length == 0 || arr[0] == null || arr[0].trim().length() == 0); }

          perhaps be defined as

          protected boolean emptyArray(String[] arr)

          { return (arr == null || arr.length == 0 || (arr.length == 1 && (arr[0] == null || arr[0].trim().length() == 0))); }

          Params:

          + public static final String HIGHLIGHT = "hl";
          + public static final String PREFIX = "hl.";
          + public static final String FIELDS = PREFIX+"fl";
          + public static final String SNIPPETS = PREFIX+"snippets";
          + public static final String FRAGSIZE = PREFIX+"fragsize";
          + public static final String INCREMENT = PREFIX+"increment";
          + public static final String SLOP = PREFIX+"slop";

          perhaps this should be PREFIX + 'regex.slop'?

          + public static final String MAX_CHARS = PREFIX+"maxAnalyzedChars";

          similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place...

          Show
          Mike Klaas added a comment - - edited Looking great Ryan (again, only commenting on the Highlighting configurability parts) should: protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr[0] == null || arr[0].trim().length() == 0); } perhaps be defined as protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || (arr.length == 1 && (arr[0] == null || arr[0].trim().length() == 0))); } Params: + public static final String HIGHLIGHT = "hl"; + public static final String PREFIX = "hl."; + public static final String FIELDS = PREFIX+"fl"; + public static final String SNIPPETS = PREFIX+"snippets"; + public static final String FRAGSIZE = PREFIX+"fragsize"; + public static final String INCREMENT = PREFIX+"increment"; + public static final String SLOP = PREFIX+"slop"; perhaps this should be PREFIX + 'regex.slop'? + public static final String MAX_CHARS = PREFIX+"maxAnalyzedChars"; similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place...
          Hide
          Ryan McKinley added a comment -

          > perhaps be defined as
          >
          > protected boolean emptyArray(String[] arr)

          { > return (arr == null || arr.length == 0 || > (arr.length == 1 && (arr[0] == null || arr[0].trim().length() == 0))); > }

          seems good. This patch tried not to change any highlighting logic, it is just moved it from the existing HighlightingUtils.java

          I will add this change.

          >
          > + public static final String MAX_CHARS = PREFIX+"maxAnalyzedChars";
          >
          > similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place...
          >

          I'm torn on what is more/less elegant.

          Should we have a new class in o.a.s.common.params for each plugin?

          Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections for the core plugin options seems ok. But I can easily be talked out of this...

          Show
          Ryan McKinley added a comment - > perhaps be defined as > > protected boolean emptyArray(String[] arr) { > return (arr == null || arr.length == 0 || > (arr.length == 1 && (arr[0] == null || arr[0].trim().length() == 0))); > } seems good. This patch tried not to change any highlighting logic, it is just moved it from the existing HighlightingUtils.java I will add this change. > > + public static final String MAX_CHARS = PREFIX+"maxAnalyzedChars"; > > similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place... > I'm torn on what is more/less elegant. Should we have a new class in o.a.s.common.params for each plugin? Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections for the core plugin options seems ok. But I can easily be talked out of this...
          Hide
          Mike Klaas added a comment -

          > I'm torn on what is more/less elegant.

          > Should we have a new class in o.a.s.common.params for each plugin?

          > Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections > for the core plugin options seems ok. But I can easily be talked out of this...

          Seems ok to me too. Spreading everything into a jumble of classes won't exactly help coherence.

          Show
          Mike Klaas added a comment - > I'm torn on what is more/less elegant. > Should we have a new class in o.a.s.common.params for each plugin? > Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections > for the core plugin options seems ok. But I can easily be talked out of this... Seems ok to me too. Spreading everything into a jumble of classes won't exactly help coherence.
          Hide
          Ryan McKinley added a comment -

          added a while ago

          Show
          Ryan McKinley added a comment - added a while ago
          Hide
          Hoss Man added a comment -

          This bug was modified as part of a bulk update using the criteria...

          • Marked "Resolved" and "Fixed"
          • Had no "Fix Version" versions
          • Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15

          The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email.

          For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

          Show
          Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked "Resolved" and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15 The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

            People

            • Assignee:
              Ryan McKinley
              Reporter:
              Brian Whitman
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development