Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: highlighter
    • Labels:
      None
    • Environment:

      Tomcat 5.5

      Description

      Adds support for the star value for the hl.fl parameter, i.e. highlighting will be done on all fields (static and dynamic). Particularly useful in conjunction with hl.requireFieldMatch=true, this way one can specify "generic" highlighting parameters independent of the query/searched fields.

      1. SOLR-540.patch
        13 kB
        Lars Kotthoff
      2. SOLR-540.patch
        12 kB
        Lars Kotthoff
      3. SOLR-540-highlight-all.patch
        7 kB
        Lars Kotthoff
      4. SOLR-540-highlight-all.patch
        7 kB
        Lars Kotthoff
      5. SOLR-540-highlight-all.patch
        7 kB
        Lars Kotthoff
      6. SOLR-540-highlight-all.patch
        6 kB
        Lars Kotthoff
      7. SOLR-540-highlight-all.patch
        6 kB
        Lars Kotthoff
      8. SOLR-540-highlight-all.patch
        6 kB
        Lars Kotthoff

        Issue Links

          Activity

          Hide
          Lars Kotthoff added a comment -

          Patch as described against svn revision 648118.

          Show
          Lars Kotthoff added a comment - Patch as described against svn revision 648118.
          Hide
          Erik Hatcher added a comment -

          While this is a useful addition, should we tackle this as part of the a more general field aliasing/globbing feature? See http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams for more details.

          Show
          Erik Hatcher added a comment - While this is a useful addition, should we tackle this as part of the a more general field aliasing/globbing feature? See http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams for more details.
          Hide
          Hoss Man added a comment -

          a quick glance at this patch indicates that it has the same characteristics that concerned me about SOLR-247 ...

          with somehting like fl=*, we're only talking about stored fields ... storing a field makes no sense unless you plan on returning it in the field list some of the time, so fl=* makes sense as a "return all of hte fields that are possible to return" option.

          ... this patch attempts to highlight every field in the index, stored or otherwise, termVectors or otherwise.

          Show
          Hoss Man added a comment - a quick glance at this patch indicates that it has the same characteristics that concerned me about SOLR-247 ... with somehting like fl=* , we're only talking about stored fields ... storing a field makes no sense unless you plan on returning it in the field list some of the time, so fl=* makes sense as a "return all of hte fields that are possible to return" option. ... this patch attempts to highlight every field in the index, stored or otherwise, termVectors or otherwise.
          Hide
          Lars Kotthoff added a comment -

          Erik, I agree. This issue has been around for almost a year now though. The general case is much harder to implement and get right than such a apecific one; therefore it might be better to start by implementing it for something not too difficult (like highlighting) and see what people think of it. First to see which features are really needed, but also to see whether it's the right approach at all.

          Chris, good point. I didn't think about this because in the schema I'm using with the patched Solr instance all fields happen to be stored. I've changed the patch to return only stored fields. Now, this is arguably not the right place for generic give-me-all-stored-fields code and the implementation is inefficient. My knowledge of the Solr code base isn't good enough though to figure out the right place for this code – it needs to talk to both Solr and Lucene code to get all static and dynamic fields. Should this maybe go into the SolrCore code? It could maintain a list of stored fields which is assembled on startup and updated on <commit/> instead of reassembling it for every query.

          Note that the latest version of the patch could also easily be tweaked to allow arbitrary wildcard specifications of the fields to highlight (e.g. hl.fl=text_*).

          Show
          Lars Kotthoff added a comment - Erik, I agree. This issue has been around for almost a year now though. The general case is much harder to implement and get right than such a apecific one; therefore it might be better to start by implementing it for something not too difficult (like highlighting) and see what people think of it. First to see which features are really needed, but also to see whether it's the right approach at all. Chris, good point. I didn't think about this because in the schema I'm using with the patched Solr instance all fields happen to be stored. I've changed the patch to return only stored fields. Now, this is arguably not the right place for generic give-me-all-stored-fields code and the implementation is inefficient. My knowledge of the Solr code base isn't good enough though to figure out the right place for this code – it needs to talk to both Solr and Lucene code to get all static and dynamic fields. Should this maybe go into the SolrCore code? It could maintain a list of stored fields which is assembled on startup and updated on <commit/> instead of reassembling it for every query. Note that the latest version of the patch could also easily be tweaked to allow arbitrary wildcard specifications of the fields to highlight (e.g. hl.fl=text_*).
          Hide
          Lars Kotthoff added a comment -

          New patch returning only the stored fields.

          Show
          Lars Kotthoff added a comment - New patch returning only the stored fields.
          Hide
          Lars Kotthoff added a comment -

          I've changed a couple of things and uploaded the new patch.

          • The stored fields are no longer determined in SolrHighlighter for every request, but in SolrIndexSearcher when creating a new object. The implementation now simply queries the schema for all field names the index reader knows about and checks for each whether it's stored. The collections of field names and stored field names are exposed through getters. SolrHighlighter simply matches against the items of the storedFieldNames collection.
          • Added generic support for wildcards; in addition to "" things like "foo_" also work now. Now for example the prototype of a dynamic field may be given as field name to highlight on and all instances of that prototype will be highlighted.
          • Added a unit test to verify that highlighting with wildcard fields works as expected.
          Show
          Lars Kotthoff added a comment - I've changed a couple of things and uploaded the new patch. The stored fields are no longer determined in SolrHighlighter for every request, but in SolrIndexSearcher when creating a new object. The implementation now simply queries the schema for all field names the index reader knows about and checks for each whether it's stored. The collections of field names and stored field names are exposed through getters. SolrHighlighter simply matches against the items of the storedFieldNames collection. Added generic support for wildcards; in addition to " " things like "foo_ " also work now. Now for example the prototype of a dynamic field may be given as field name to highlight on and all instances of that prototype will be highlighted. Added a unit test to verify that highlighting with wildcard fields works as expected.
          Hide
          Lars Kotthoff added a comment -

          The old patch didn't verify properly that only stored static fields are returned. The new patch fixes this.

          Show
          Lars Kotthoff added a comment - The old patch didn't verify properly that only stored static fields are returned. The new patch fixes this.
          Hide
          Lars Kotthoff added a comment -

          Syncing patch with trunk.

          Show
          Lars Kotthoff added a comment - Syncing patch with trunk.
          Hide
          David Smiley added a comment -

          I'd like this incorporated into Solr. I stupidly didn't search for this feature and I went and did it myself... was just about to submit my own patch. Doh!

          Show
          David Smiley added a comment - I'd like this incorporated into Solr. I stupidly didn't search for this feature and I went and did it myself... was just about to submit my own patch. Doh!
          Hide
          David Smiley added a comment -

          Hey SOLR-540 people, please see the comment thread on SOLR-750 which is apparently a bug with this patch.

          Show
          David Smiley added a comment - Hey SOLR-540 people, please see the comment thread on SOLR-750 which is apparently a bug with this patch.
          Hide
          Lars Kotthoff added a comment -

          Attaching new patch which only highlights on stored text/string fields. Added test case to verify that.

          Show
          Lars Kotthoff added a comment - Attaching new patch which only highlights on stored text/string fields. Added test case to verify that.
          Hide
          Lars Kotthoff added a comment -

          Syncing patch with trunk.

          Show
          Lars Kotthoff added a comment - Syncing patch with trunk.
          Hide
          David Smiley added a comment -

          I've seen cases where I'd get an error because the previous index contained a field that is no longer in the current schema. I've corrected this by Changing line 171 of SolrIndexSearcher to go from a simple lookup of the field to one that catches this exception like so:

                SchemaField field;
                try {
                  field = schema.getField(fieldName);
                } catch (SolrException e) {
                  log.fine("Skipping indexed field not found in schema e:"+e);//no reason to log stacktrace
                  continue;
                }
          
          Show
          David Smiley added a comment - I've seen cases where I'd get an error because the previous index contained a field that is no longer in the current schema. I've corrected this by Changing line 171 of SolrIndexSearcher to go from a simple lookup of the field to one that catches this exception like so: SchemaField field; try { field = schema.getField(fieldName); } catch (SolrException e) { log.fine( "Skipping indexed field not found in schema e:" +e); //no reason to log stacktrace continue ; }
          Hide
          Lars Kotthoff added a comment -

          Hmm, actually the list of fields to highlight on should be rebuilt in cases like that. Are you opening a new searcher for the new index?

          Show
          Lars Kotthoff added a comment - Hmm, actually the list of fields to highlight on should be rebuilt in cases like that. Are you opening a new searcher for the new index?
          Hide
          David Smiley added a comment -

          This happens to me when I run the ant unit tests on a few of the tests ( I don't have an example off hand ). It may be related to re-running tests without cleaning; I'm not sure. Perhaps the state of past-runs are affected.

          Show
          David Smiley added a comment - This happens to me when I run the ant unit tests on a few of the tests ( I don't have an example off hand ). It may be related to re-running tests without cleaning; I'm not sure. Perhaps the state of past-runs are affected.
          Hide
          Lars Kotthoff added a comment -

          I've run the tests repeatedly and wasn't able to reproduce the problem. Can you perhaps nail it down to a particular test?

          Show
          Lars Kotthoff added a comment - I've run the tests repeatedly and wasn't able to reproduce the problem. Can you perhaps nail it down to a particular test?
          Hide
          Lars Kotthoff added a comment -

          Attaching patch which handles the case where a field is present in the index but not defined in the field by logging a warning instead of throwing a RuntimeException.

          Show
          Lars Kotthoff added a comment - Attaching patch which handles the case where a field is present in the index but not defined in the field by logging a warning instead of throwing a RuntimeException.
          Hide
          Lars Kotthoff added a comment -

          Syncing patch with trunk.

          Show
          Lars Kotthoff added a comment - Syncing patch with trunk.
          Hide
          Yonik Seeley added a comment -

          Looks useful.

          It's questionable if we always want to generate the fields collection, or if we want to defer until first requested, but it's an easy change to do it later if we decide to (i.e. not an interface change).

          Perhaps the test for text fields should use istanceof instead of class comparison so subclasses of TextField will work?

          Show
          Yonik Seeley added a comment - Looks useful. It's questionable if we always want to generate the fields collection, or if we want to defer until first requested, but it's an easy change to do it later if we decide to (i.e. not an interface change). Perhaps the test for text fields should use istanceof instead of class comparison so subclasses of TextField will work?
          Hide
          Lars Kotthoff added a comment -

          Attaching new patch which uses instanceof instead of comparing the classes directly and defers the generation of the list of fields to highlight on until they're requested. I left the assignment of fields the index reader knows about in the constructor because this shouldn't be too expensive. It could easily be moved into the getter as well though.

          Show
          Lars Kotthoff added a comment - Attaching new patch which uses instanceof instead of comparing the classes directly and defers the generation of the list of fields to highlight on until they're requested. I left the assignment of fields the index reader knows about in the constructor because this shouldn't be too expensive. It could easily be moved into the getter as well though.
          Hide
          Yonik Seeley added a comment -

          Committed, thanks Lars!

          Show
          Yonik Seeley added a comment - Committed, thanks Lars!
          Hide
          Yonik Seeley added a comment -

          I imagine it would be useful to be able to use globbing in conjunction with normal field specification, or multiple globs?

          example: hl.fl=body,_text,_misc

          I didn't hold up committing this issue for that since it would be a strict API superset.
          Thoughts?

          Show
          Yonik Seeley added a comment - I imagine it would be useful to be able to use globbing in conjunction with normal field specification, or multiple globs? example: hl.fl=body, _text, _misc I didn't hold up committing this issue for that since it would be a strict API superset. Thoughts?
          Hide
          Lars Kotthoff added a comment -

          Muchos gracias.

          Re globbing, I'm sure I saw an issue about doing that in general somewhere. Can't find it though, does anybody remember? Anyway, I think this should be tackled more generally, i.e. have a utility function that takes a list of parameters (possibly containing globs) and a list of possible values and returns the matching values. Most likely this function would be declared in an interface somewhere and implemented differently for highlighting, faceting, etc. This would be quite a major change though and should definitely be a separate issue.

          (Yonik, next on my list is SOLR-634 btw. HINT HINT)

          Show
          Lars Kotthoff added a comment - Muchos gracias. Re globbing, I'm sure I saw an issue about doing that in general somewhere. Can't find it though, does anybody remember? Anyway, I think this should be tackled more generally, i.e. have a utility function that takes a list of parameters (possibly containing globs) and a list of possible values and returns the matching values. Most likely this function would be declared in an interface somewhere and implemented differently for highlighting, faceting, etc. This would be quite a major change though and should definitely be a separate issue. (Yonik, next on my list is SOLR-634 btw. HINT HINT)
          Hide
          Lars Kotthoff added a comment -

          Ah, just remembered, see SOLR-650.

          Show
          Lars Kotthoff added a comment - Ah, just remembered, see SOLR-650 .
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Unassigned
              Reporter:
              Lars Kotthoff
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development