Solr
  1. Solr
  2. SOLR-3897

Preserve multi-value fields during hit highlighting

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1
    • Component/s: highlighter
    • Labels:
      None

      Description

      The behavior of the default Solr hit highlighter on multi-value fields is to only return the values that have a hit and sort them by score.

      This ticket supplies a patch that adds a new highlight parameter called "preserveMulti" which can be used on a feild by field basis to return all of the values in their original order. If this parameter is used, the values that have a hit are highlighted and the ones that do not contain a hit are returned un-highlighted.

      The "preserveMulti" parameter works with the default standard highlighter and follows the standard highlighting conventions.

      Sample usage for a field called "cat":

      f.cat.hl.preserveMulti=true

      1. SOLR-3897_snippets.patch
        3 kB
        Yonik Seeley
      2. SOLR-3897.patch
        5 kB
        Joel Bernstein
      3. SOLR-3897.patch
        4 kB
        Joel Bernstein

        Activity

        Hide
        Joel Bernstein added a comment -

        Added test case.

        Show
        Joel Bernstein added a comment - Added test case.
        Hide
        Yonik Seeley added a comment -

        This looks like a reasonable feature - I plan on committing as-is unless someone comes up with a better name than "preserveMulti"

        Show
        Yonik Seeley added a comment - This looks like a reasonable feature - I plan on committing as-is unless someone comes up with a better name than "preserveMulti"
        Show
        Yonik Seeley added a comment - Committed to trunk: http://svn.apache.org/viewvc?rev=1393171&view=rev and 4x: http://svn.apache.org/viewvc?rev=1393173&view=rev
        Hide
        Yonik Seeley added a comment -

        Erik asked the question "suppose a multiValued field has two values, both get highlighted with multiple snippets"...

        I had assumed that one would not request multiple snippets when using preserveMulti (or that one would use hl.fragsize=0 to get complete field values).
        But I just checked the test and it requires snippets=2 to get both field values.

        Seems like for preserveMulti=true, one should get all field values regardless of the setting of snippets.

        Should the number of snippets per field value be capped at 1 when preserveMulti==true, or should we consider snippets to be per value rather than per field?

        Show
        Yonik Seeley added a comment - Erik asked the question "suppose a multiValued field has two values, both get highlighted with multiple snippets"... I had assumed that one would not request multiple snippets when using preserveMulti (or that one would use hl.fragsize=0 to get complete field values). But I just checked the test and it requires snippets=2 to get both field values. Seems like for preserveMulti=true, one should get all field values regardless of the setting of snippets. Should the number of snippets per field value be capped at 1 when preserveMulti==true, or should we consider snippets to be per value rather than per field?
        Hide
        Joel Bernstein added a comment -

        I can change this so things are more automatic, by setting fragsize=0 and and snippets equal to the number field values.

        Show
        Joel Bernstein added a comment - I can change this so things are more automatic, by setting fragsize=0 and and snippets equal to the number field values.
        Hide
        Joel Bernstein added a comment -

        Since this is already committed, shall I create a new patch based on a fresh 4x pull? This would be an incremental patch for this issue.

        Show
        Joel Bernstein added a comment - Since this is already committed, shall I create a new patch based on a fresh 4x pull? This would be an incremental patch for this issue.
        Hide
        Yonik Seeley added a comment -

        I can change this so things are more automatic, by setting fragsize=0 and and snippets equal to the number field values.

        That would work for small fields... although there may be use-cases where one does want a fragment of each field value instead of the whole field value.

        Since this is already committed, shall I create a new patch based on a fresh 4x pull?

        A fresh pull of trunk, yes. It will also be merged back to 4x when committed.

        Show
        Yonik Seeley added a comment - I can change this so things are more automatic, by setting fragsize=0 and and snippets equal to the number field values. That would work for small fields... although there may be use-cases where one does want a fragment of each field value instead of the whole field value. Since this is already committed, shall I create a new patch based on a fresh 4x pull? A fresh pull of trunk, yes. It will also be merged back to 4x when committed.
        Hide
        Erik Hatcher added a comment -

        Seems like for preserveMulti=true, one should get all field values regardless of the setting of snippets. Should the number of snippets per field value be capped at 1 when preserveMulti==true, or should we consider snippets to be per value rather than per field?

        I think each field value, when preserveMulti=true, should be considered separately and all highlighting parameters for that field should apply to each field value. Number of fragments, for example, should be per field value instance. And I suppose this necessitates another array level in the response?

        Sorry if this got a fair bit more complicated than it started. After pondering this after the original patch was committed, I realized there'd be some confusion between field values and fragments.

        Show
        Erik Hatcher added a comment - Seems like for preserveMulti=true, one should get all field values regardless of the setting of snippets. Should the number of snippets per field value be capped at 1 when preserveMulti==true, or should we consider snippets to be per value rather than per field? I think each field value, when preserveMulti=true, should be considered separately and all highlighting parameters for that field should apply to each field value. Number of fragments, for example, should be per field value instance. And I suppose this necessitates another array level in the response? Sorry if this got a fair bit more complicated than it started. After pondering this after the original patch was committed, I realized there'd be some confusion between field values and fragments.
        Hide
        Yonik Seeley added a comment -

        And I suppose this necessitates another array level in the response?

        I guess it depends on how widely applicable this feature is.
        If very few will have a need for it (which I think is the case?) we should try and keep it as unobtrusive as possible.

        Show
        Yonik Seeley added a comment - And I suppose this necessitates another array level in the response? I guess it depends on how widely applicable this feature is. If very few will have a need for it (which I think is the case?) we should try and keep it as unobtrusive as possible.
        Hide
        Joel Bernstein added a comment -

        fragsize works only with the FastVectorHighlighter so if we want to ensure one fragment per field value then we will have to work with the fragmenter. The default fragmenter works in chunks of 100 characters.

        At this point I'm thinking it might be best to leave it as it is. The user will have to set the snippets count but this can be documented. Each field value will be a single fragment unless the it has more then 100 characters and if this an issue then a new fragmenter can be provided.

        Show
        Joel Bernstein added a comment - fragsize works only with the FastVectorHighlighter so if we want to ensure one fragment per field value then we will have to work with the fragmenter. The default fragmenter works in chunks of 100 characters. At this point I'm thinking it might be best to leave it as it is. The user will have to set the snippets count but this can be documented. Each field value will be a single fragment unless the it has more then 100 characters and if this an issue then a new fragmenter can be provided.
        Hide
        Joel Bernstein added a comment -

        I was incorrect, fragsize works with the default highlighter. I still vote for leaving the original patch as it is and documenting the use of snippets and fragsize with it. This will provide flexibility to work with different use cases.

        Another test case can be added to test field values with the fragsize param.

        Show
        Joel Bernstein added a comment - I was incorrect, fragsize works with the default highlighter. I still vote for leaving the original patch as it is and documenting the use of snippets and fragsize with it. This will provide flexibility to work with different use cases. Another test case can be added to test field values with the fragsize param.
        Hide
        Yonik Seeley added a comment -

        preserveMulti implies getting values back for every field value, hence we always should regardless of what snippets is set to.
        If we aren't changing the request format, it still seems like the most sane thing is to show all field values and have exactly one snippet per field value.

        Show
        Yonik Seeley added a comment - preserveMulti implies getting values back for every field value, hence we always should regardless of what snippets is set to. If we aren't changing the request format, it still seems like the most sane thing is to show all field values and have exactly one snippet per field value.
        Hide
        Yonik Seeley added a comment -

        Here's a patch that always lists all field values, regardless of the value of snippets.

        Given that the default value of snippets is 1, this should work fine - there will be exactly one snippet generated per field value.

        Note that we currently show a snippet for every field value even if nothing in the field matched. I'm not sure if this is the desirable behavior or not. For short fields, the client could just return the field values in the document and use that if the highlighter section was missing. But perhaps snippet generation might still be useful for long fields.

        Show
        Yonik Seeley added a comment - Here's a patch that always lists all field values, regardless of the value of snippets. Given that the default value of snippets is 1, this should work fine - there will be exactly one snippet generated per field value. Note that we currently show a snippet for every field value even if nothing in the field matched. I'm not sure if this is the desirable behavior or not. For short fields, the client could just return the field values in the document and use that if the highlighter section was missing. But perhaps snippet generation might still be useful for long fields.
        Hide
        Joel Bernstein added a comment -

        This patch looks good to me. I think for consistency, still returning a snippet for every field even if nothing matches is the desired behavior.

        Show
        Joel Bernstein added a comment - This patch looks good to me. I think for consistency, still returning a snippet for every field even if nothing matches is the desired behavior.
        Show
        Yonik Seeley added a comment - committed. trunk: http://svn.apache.org/viewvc?rev=1396317&view=rev 4x: http://svn.apache.org/viewvc?rev=1396320&view=rev
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1396320

        SOLR-3897: return a snippet for every value when preserveMulti=true

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1396320 SOLR-3897 : return a snippet for every value when preserveMulti=true
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1393173

        SOLR-3897: hl.preserveMulti to preserve all multiValued field values when highlighting

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1393173 SOLR-3897 : hl.preserveMulti to preserve all multiValued field values when highlighting

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Joel Bernstein
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development