Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8583

Apply highlighting to hl.alternateField

    Details

      Description

      Today, you can configure hl.alternateField for highlighter to display if no snippets were produced from original field. But the contents of the fallback field is output without highlighting the original query terms.

      This issue will cause alternate field to be highlighted with no snippet generation, and still respect max length. You can turn it off using new param hl.highlightAlternate=false. Supported highlighters: Simple, FVH

      1. SOLR-8583.patch
        18 kB
        Jan Høydahl
      2. SOLR-8583.patch
        17 kB
        Jan Høydahl
      3. SOLR-8583.patch
        16 kB
        Jan Høydahl
      4. SOLR-8583.patch
        16 kB
        Jan Høydahl
      5. SOLR-8583.patch
        14 kB
        Jan Høydahl
      6. SOLR-8583.patch
        6 kB
        Jan Høydahl

        Activity

        Hide
        janhoy Jan Høydahl added a comment - - edited

        Attaching a first working patch.

        There is a new param hl.highlightAlternate=true which when enabled will apply the standard highlighter on the alternate field.

        We do this by calling doHighlightingByHighlighter() again on the alternate field, but with some hardcoded parameters: hl.snippets=1&hl.fragsize=Int.MAX. We also set hl.maxAnalyzedChars equal to hl.maxAlternateFieldLength if applicable.

        There is one TODO. To be able to override these params, I needed to clone the SolrParams object on the request and replace it with a ModifialbleSolrParams, and then put back the old. Might not be thread safe... Perhaps a better way is to add a new params argument to doHighlightingByHighlighter?

        Also, I have not tested what happens if people use some other highlighter like FVH. I suspect that other highlighters may not respect hl.pre?

        Also, PostingsHighlighter does not support hl.alternateField at all, and FVH users specify pre and post using hl.tag.pre while Simple highlighter uses hl.simple.pre, so hardcoding which highlighter to use for the fallback field would cause problems with these settings not being in sync.

        Why cannot all highlighters use the same hl.tag.pre/post params?

        Show
        janhoy Jan Høydahl added a comment - - edited Attaching a first working patch. There is a new param hl.highlightAlternate=true which when enabled will apply the standard highlighter on the alternate field. We do this by calling doHighlightingByHighlighter() again on the alternate field, but with some hardcoded parameters: hl.snippets=1&hl.fragsize=Int.MAX . We also set hl.maxAnalyzedChars equal to hl.maxAlternateFieldLength if applicable. There is one TODO. To be able to override these params, I needed to clone the SolrParams object on the request and replace it with a ModifialbleSolrParams , and then put back the old. Might not be thread safe... Perhaps a better way is to add a new params argument to doHighlightingByHighlighter? Also, I have not tested what happens if people use some other highlighter like FVH. I suspect that other highlighters may not respect hl.pre ? Also, PostingsHighlighter does not support hl.alternateField at all, and FVH users specify pre and post using hl.tag.pre while Simple highlighter uses hl.simple.pre , so hardcoding which highlighter to use for the fallback field would cause problems with these settings not being in sync. Why cannot all highlighters use the same hl.tag.pre/post params?
        Hide
        janhoy Jan Høydahl added a comment -

        Perhaps SolrHighlighter abstract class should provide a new method String alternateField(String fieldName) and another abstract String[] alternateText(doc, fieldName, req), so all implementations must relate to the same alternate fallback API?

        Show
        janhoy Jan Høydahl added a comment - Perhaps SolrHighlighter abstract class should provide a new method String alternateField(String fieldName) and another abstract String[] alternateText(doc, fieldName, req) , so all implementations must relate to the same alternate fallback API?
        Hide
        janhoy Jan Høydahl added a comment -

        David Smiley you know the code, any guidance on this?

        Show
        janhoy Jan Høydahl added a comment - David Smiley you know the code, any guidance on this?
        Hide
        dsmiley David Smiley added a comment -

        Hi Jan Høydahl, thanks for requesting my input.

        Overall this looks good. I do wonder if we want to introduce new options to the most overly configurable component there is when we could change the existing behavior, and perhaps nobody would notice. In particular only if hl.requireFieldMatch=true (which is normally false) would this make sense, and then we could just simply do it without some new option.

        I think swapping in and out the SolrParams on the request is fine, albeit unfortunate. Maybe do in try-finally. And maybe instead of loading up a new ModifiableSolrParams with all of the existing ones, this could be done with SolrParams.wrapDefault? Or just leave it; it works.

        Instead of schema.getFieldOrNull for the alternate field, I think it should just call getField since it should be erroneous to refer to a non-existent field.

        Why cannot all highlighters use the same hl.tag.pre/post params?

        I feel your pain! And now we are stuck with it forever (sarcasm). Expect to see a new highlighter announced this year, and with it I will propose a bit of unification on the Solr side.

        Show
        dsmiley David Smiley added a comment - Hi Jan Høydahl , thanks for requesting my input. Overall this looks good. I do wonder if we want to introduce new options to the most overly configurable component there is when we could change the existing behavior, and perhaps nobody would notice. In particular only if hl.requireFieldMatch=true (which is normally false) would this make sense, and then we could just simply do it without some new option. I think swapping in and out the SolrParams on the request is fine, albeit unfortunate. Maybe do in try-finally. And maybe instead of loading up a new ModifiableSolrParams with all of the existing ones, this could be done with SolrParams.wrapDefault? Or just leave it; it works. Instead of schema.getFieldOrNull for the alternate field, I think it should just call getField since it should be erroneous to refer to a non-existent field. Why cannot all highlighters use the same hl.tag.pre/post params? I feel your pain! And now we are stuck with it forever (sarcasm). Expect to see a new highlighter announced this year, and with it I will propose a bit of unification on the Solr side.
        Hide
        janhoy Jan Høydahl added a comment -

        I do wonder if we want to introduce new options [...]

        Agree. Think I'll keep the param but make it default to true, so people can turn it off to get old behavior

        I think swapping in and out the SolrParams on the request is fine, albeit unfortunate. Maybe do in try-finally. And maybe instead of loading up a new ModifiableSolrParams with all of the existing ones, this could be done with SolrParams.wrapDefault? Or just leave it; it works.

        Found a way to avoid swapping. Instead the doHighlightingByXXX() methods take a new argument invariantParams which is then combined with wrapDefault. it works well and is less hacky.

        Instead of schema.getFieldOrNull for the alternate field, I think it should just call getField since it should be erroneous to refer to a non-existent field.

        This caused test failure in existing tests. So I'm keeping old behavior that if fallbackField does not exist, nothing is output at all.

        Expect to see a new highlighter announced this year, and with it I will propose a bit of unification on the Solr side.

        Looking forward to that!

        Show
        janhoy Jan Høydahl added a comment - I do wonder if we want to introduce new options [...] Agree. Think I'll keep the param but make it default to true, so people can turn it off to get old behavior I think swapping in and out the SolrParams on the request is fine, albeit unfortunate. Maybe do in try-finally. And maybe instead of loading up a new ModifiableSolrParams with all of the existing ones, this could be done with SolrParams.wrapDefault? Or just leave it; it works. Found a way to avoid swapping. Instead the doHighlightingByXXX() methods take a new argument invariantParams which is then combined with wrapDefault . it works well and is less hacky. Instead of schema.getFieldOrNull for the alternate field, I think it should just call getField since it should be erroneous to refer to a non-existent field. This caused test failure in existing tests. So I'm keeping old behavior that if fallbackField does not exist, nothing is output at all. Expect to see a new highlighter announced this year, and with it I will propose a bit of unification on the Solr side. Looking forward to that!
        Hide
        janhoy Jan Høydahl added a comment -

        New patch:

        • Avoid swapping in/out solrParams on the request
        • Use FVH on the alternate if that is what user has configured
        • Support field overrides f.fieldname.hl.* across the board
        • Add tests with/without fieldLength
        • Add tests with custom pre/post for both simple and fvh
        • Now enabled by default, have to set hl.highlightAlternate=false to turn off

        Still no support for PostingsHighlighter

        Show
        janhoy Jan Høydahl added a comment - New patch: Avoid swapping in/out solrParams on the request Use FVH on the alternate if that is what user has configured Support field overrides f.fieldname.hl.* across the board Add tests with/without fieldLength Add tests with custom pre/post for both simple and fvh Now enabled by default, have to set hl.highlightAlternate=false to turn off Still no support for PostingsHighlighter
        Hide
        dsmiley David Smiley added a comment -

        Shouldn't this new logic be integrated with alternateField()'s logic? It's confusing that it's separate, and makes this long method longer with a bit of dejavu logic. Perhaps this is confounded by the awkwardness induced by the lazy init of re-using fvh & fvhFieldQuery? Hmmm; maybe we should populate that eagerly if we detect FVH? That ought to allow to move the code to not repeat our logic... maybe by extracting a method on the majority of the content of the field highlighting loop so we can invoke it for alternate field highlighting. But again that's currently blocked by the lazy init.

        RE passing SolrParams to the doHighlighting* methods, I think the caller should use SolrParams.wrapDefaults instead of making the callee do this. In this way, the parameter is simply "params" without any reference to "invariant" terminology/semantics. BTW I was fine with you calling req.setParams as you had it before – less gotchas. As it is, doHighlightingByHighlighter calls methods like getHighlighter and getPhraseHighlighter that don't have this overridden params. It might not matter but it's problematic in principle at least.

        Show
        dsmiley David Smiley added a comment - Shouldn't this new logic be integrated with alternateField()'s logic? It's confusing that it's separate, and makes this long method longer with a bit of dejavu logic. Perhaps this is confounded by the awkwardness induced by the lazy init of re-using fvh & fvhFieldQuery? Hmmm; maybe we should populate that eagerly if we detect FVH? That ought to allow to move the code to not repeat our logic... maybe by extracting a method on the majority of the content of the field highlighting loop so we can invoke it for alternate field highlighting. But again that's currently blocked by the lazy init. RE passing SolrParams to the doHighlighting* methods, I think the caller should use SolrParams.wrapDefaults instead of making the callee do this. In this way, the parameter is simply "params" without any reference to "invariant" terminology/semantics. BTW I was fine with you calling req.setParams as you had it before – less gotchas. As it is, doHighlightingByHighlighter calls methods like getHighlighter and getPhraseHighlighter that don't have this overridden params. It might not matter but it's problematic in principle at least.
        Hide
        janhoy Jan Høydahl added a comment -

        New patch incorporating David Smiley's comments more or less:

        • Went back to swapping req.params with a wrapDefaults version.
        • New method doHighlightingOfField() which gets rid of duplication of code
        • Lazy FVH init by passing around new inner class FvhContainer with members fvh and fieldQuery which can then be altered by methods
        • Moved highlighting of alternate into method alternateField() to gather all logic in same place
        • FIeldname loop now looks like this:
          // Highlight per-field
          for (String fieldName : fieldNames) {
            SchemaField schemaField = schema.getFieldOrNull(fieldName);
          
            Object fieldHighlights; // object type allows flexibility for subclassers
            fieldHighlights = doHighlightingOfField(schemaField, params, fvhContainer, doc, docId, query, reader, req);
          
            if (fieldHighlights == null) {
              fieldHighlights = alternateField(doc, fieldName, req, docId, query, reader, schema, fvhContainer);
            }
          
            if (fieldHighlights != null) {
              docHighlights.add(fieldName, fieldHighlights);
            }
          } // for each field
          

        What puzzles me is that the changes should be pure code structure, no functionality change, yet one of the tests started failing. It was the first test of testAlternateSummaryWithHighlighting() setting maxAlternateFieldLength=18. Earlier it returned <simplepre>keyword</simplepost> is only here, but with the last patch I had to change it into <simplepre>keyword</simplepost> is only.

        Currently I'm not able to debug tests in my IntelliJ 16, so I just changed the assert instead of digging further.

        Show
        janhoy Jan Høydahl added a comment - New patch incorporating David Smiley 's comments more or less: Went back to swapping req.params with a wrapDefaults version. New method doHighlightingOfField() which gets rid of duplication of code Lazy FVH init by passing around new inner class FvhContainer with members fvh and fieldQuery which can then be altered by methods Moved highlighting of alternate into method alternateField() to gather all logic in same place FIeldname loop now looks like this: // Highlight per-field for ( String fieldName : fieldNames) { SchemaField schemaField = schema.getFieldOrNull(fieldName); Object fieldHighlights; // object type allows flexibility for subclassers fieldHighlights = doHighlightingOfField(schemaField, params, fvhContainer, doc, docId, query, reader, req); if (fieldHighlights == null ) { fieldHighlights = alternateField(doc, fieldName, req, docId, query, reader, schema, fvhContainer); } if (fieldHighlights != null ) { docHighlights.add(fieldName, fieldHighlights); } } // for each field What puzzles me is that the changes should be pure code structure, no functionality change, yet one of the tests started failing. It was the first test of testAlternateSummaryWithHighlighting() setting maxAlternateFieldLength=18. Earlier it returned <simplepre>keyword</simplepost> is only here , but with the last patch I had to change it into <simplepre>keyword</simplepost> is only . Currently I'm not able to debug tests in my IntelliJ 16, so I just changed the assert instead of digging further.
        Hide
        janhoy Jan Høydahl added a comment -

        The reason for difference in length is probably in how the fragmenters work. Tried out some more sizes and it changes, although not at the limits I expected for Simple highlighter.

        Here's a new patch using only FRAGSIZE in limiting maxAlternateFieldLength, instead of also using MAX_CHARS, as it did not add any value.

        Show
        janhoy Jan Høydahl added a comment - The reason for difference in length is probably in how the fragmenters work. Tried out some more sizes and it changes, although not at the limits I expected for Simple highlighter. Here's a new patch using only FRAGSIZE in limiting maxAlternateFieldLength, instead of also using MAX_CHARS , as it did not add any value.
        Hide
        dsmiley David Smiley added a comment -

        This is looking much better – nice job Jan! That FvhContainer is perfect; I'm kicking myself for not thinking of that already.

        The test exhibits a problem related to not using hl.requireFieldMatch. Looking at the test, I see we have a query on tv_text yet we're asking to highlight t_text (falling back on tv_text as alternate). What we assert is sensible based on these args, but this is an unnatural example. A more natural example is that the query is on t_text – the same field that is highlighted. What should then happen? Well, we could try and make it work by setting hl.requireFieldMatch=false or we could demand that this be set as a prerequisite to highlighting alternate fields. Or we could leave the logic be and document that you most likely need to set this to false (what I'm kinda leaning to but I have no conviction). Note that FVH doesn't support per-field overrides of that setting, so if we try to set that ourselves, then it won't work with FVH. How to handle this is debatable. In any case, the tests should be expanded to query based on t_text in addition to what it tests now which is good to test too.

        Minor quibbles:

        • can you order the parameters to doHighlightingOfField to have more consistency with the other methods that take much of the same parameters: doc, docId, schemaField, fvhContainer, query, reader, req
        Show
        dsmiley David Smiley added a comment - This is looking much better – nice job Jan! That FvhContainer is perfect; I'm kicking myself for not thinking of that already. The test exhibits a problem related to not using hl.requireFieldMatch. Looking at the test, I see we have a query on tv_text yet we're asking to highlight t_text (falling back on tv_text as alternate). What we assert is sensible based on these args, but this is an unnatural example. A more natural example is that the query is on t_text – the same field that is highlighted. What should then happen? Well, we could try and make it work by setting hl.requireFieldMatch=false or we could demand that this be set as a prerequisite to highlighting alternate fields. Or we could leave the logic be and document that you most likely need to set this to false (what I'm kinda leaning to but I have no conviction). Note that FVH doesn't support per-field overrides of that setting, so if we try to set that ourselves, then it won't work with FVH. How to handle this is debatable. In any case, the tests should be expanded to query based on t_text in addition to what it tests now which is good to test too. Minor quibbles: can you order the parameters to doHighlightingOfField to have more consistency with the other methods that take much of the same parameters: doc, docId, schemaField, fvhContainer, query, reader, req
        Hide
        janhoy Jan Høydahl added a comment - - edited

        New patch incorporating Davids comments:

        • Changed order of params for doHighlightingOfField() and alternateField()
        • Added test for hl.requireFieldMatch=true, searching a new field other_t:keyword
        • Added test that proves that we get highlighting of fallback field even if there was no match in the field when hl.requireFieldMatch=false

        A more natural example is that the query is on t_text – the same field that is highlighted. What should then happen?

        If you search "keyword" on t_text you will not get any hits

        To me it looks like the requireFieldMatch works as expected, at least with the default highlighter, so I have not changed any logic. Since it behaves exactly the same way for alternate field as for the main highlight field I'm not sure if we need to document anything here either.

        Think this is getting close to committable?

        Show
        janhoy Jan Høydahl added a comment - - edited New patch incorporating Davids comments: Changed order of params for doHighlightingOfField() and alternateField() Added test for hl.requireFieldMatch=true , searching a new field other_t:keyword Added test that proves that we get highlighting of fallback field even if there was no match in the field when hl.requireFieldMatch=false A more natural example is that the query is on t_text – the same field that is highlighted. What should then happen? If you search "keyword" on t_text you will not get any hits To me it looks like the requireFieldMatch works as expected, at least with the default highlighter, so I have not changed any logic. Since it behaves exactly the same way for alternate field as for the main highlight field I'm not sure if we need to document anything here either. Think this is getting close to committable?
        Hide
        janhoy Jan Høydahl added a comment -
        Show
        janhoy Jan Høydahl added a comment - Added a suggested RefGuide documentation to https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter
        Hide
        dsmiley David Smiley added a comment -

        This is certainly committable; I think you can commit it as-is if you want, or make further tweaks based on my remaining minor comments.

        You could move this line (that already existed) higher up so that you needn't re-pull it out in the highlighting alternate code

        • int alternateFieldLen = params.getFieldInt(fieldName, HighlightParams.ALTERNATE_FIELD_LENGTH, 0);
        • why the max(18,altFieldLen) ? 18 seems like a magic number.... if someone sets the alt field length to some silly low number then it's there prerogative to do so?

        If you search "keyword" on t_text you will not get any hits

        hehe, well, consider dismax with multiple fields.

        Docs looks good, basically. I'll correct a small grammatical error.

        Show
        dsmiley David Smiley added a comment - This is certainly committable; I think you can commit it as-is if you want, or make further tweaks based on my remaining minor comments. You could move this line (that already existed) higher up so that you needn't re-pull it out in the highlighting alternate code int alternateFieldLen = params.getFieldInt(fieldName, HighlightParams.ALTERNATE_FIELD_LENGTH, 0); why the max(18,altFieldLen) ? 18 seems like a magic number.... if someone sets the alt field length to some silly low number then it's there prerogative to do so? If you search "keyword" on t_text you will not get any hits hehe, well, consider dismax with multiple fields. Docs looks good, basically. I'll correct a small grammatical error.
        Hide
        janhoy Jan Høydahl added a comment -

        You could move this line (that already existed) higher up so that you needn't re-pull it out in the highlighting alternate code

        Done

        why the max(18,altFieldLen) ? 18 seems like a magic number....

        FVH requires FRAGSIZE to be at least 18, else there will be an exception. So I just set that as the lower limit. Added a comment line in the code:

        // Enforce maxAlternateFieldLength by FRAGSIZE. Minimum 18 due to FVH limitations
        

        Also added David Smiley as contributor in CHANGES, thanks for all the feedback!

        Will go ahead and commit latest patch now.

        Show
        janhoy Jan Høydahl added a comment - You could move this line (that already existed) higher up so that you needn't re-pull it out in the highlighting alternate code Done why the max(18,altFieldLen) ? 18 seems like a magic number.... FVH requires FRAGSIZE to be at least 18, else there will be an exception. So I just set that as the lower limit. Added a comment line in the code: // Enforce maxAlternateFieldLength by FRAGSIZE. Minimum 18 due to FVH limitations Also added David Smiley as contributor in CHANGES, thanks for all the feedback! Will go ahead and commit latest patch now.
        Hide
        janhoy Jan Høydahl added a comment -

        Documentation added to RefGuide.

        Show
        janhoy Jan Høydahl added a comment - Documentation added to RefGuide.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 05ce40a0b9da5989613a20f985f9796ed533a8c4 in lucene-solr's branch refs/heads/master from Jan Høydahl
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05ce40a ]

        SOLR-8583: Do not attempt highlight of alternate if it is same as original hl field

        Show
        jira-bot ASF subversion and git services added a comment - Commit 05ce40a0b9da5989613a20f985f9796ed533a8c4 in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05ce40a ] SOLR-8583 : Do not attempt highlight of alternate if it is same as original hl field
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit a460addd2f9511432da7684bd2ee6598025389ea in lucene-solr's branch refs/heads/branch_6x from Jan Høydahl
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a460add ]

        SOLR-8583: Do not attempt highlight of alternate if it is same as original hl field
        (cherry picked from commit 05ce40a)

        Show
        jira-bot ASF subversion and git services added a comment - Commit a460addd2f9511432da7684bd2ee6598025389ea in lucene-solr's branch refs/heads/branch_6x from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a460add ] SOLR-8583 : Do not attempt highlight of alternate if it is same as original hl field (cherry picked from commit 05ce40a)
        Hide
        janhoy Jan Høydahl added a comment -

        Yey, commit bot actually started working for me again, after changing my JIRA email address to my @apache.org addr. Looks like there is a bug in the py script that skips commenting if user is not found in JIRA?

        Show
        janhoy Jan Høydahl added a comment - Yey, commit bot actually started working for me again, after changing my JIRA email address to my @apache.org addr. Looks like there is a bug in the py script that skips commenting if user is not found in JIRA?

          People

          • Assignee:
            janhoy Jan Høydahl
            Reporter:
            janhoy Jan Høydahl
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development