Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10018

hl.maxAnalyzedChars should have consistent default across highlighters

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.4
    • Fix Version/s: 6.5
    • Component/s: highlighter
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      I see no reason why hl.maxAnalyzedChars should have different defaults per highlighter implementation. The default is typically 51,200 but for the UnifiedHighlighter and PostingsHighlighter it's 10,000. This could easily lead to an unexpected lack of highlights that you expect to see when trying the UH.

      1. SOLR_10018__default_hl_maxAnalyazedChars.patch
        5 kB
        David Smiley
      2. SOLR-10018-test.patch
        1 kB
        Christine Poerschke

        Activity

        Hide
        cpoerschke Christine Poerschke added a comment -

        If the intent is (and it might not be) that the new SolrHighlighter.DEFAULT_MAX_CHARS (51200) matches the value of existing (lucene) Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE (50*1024) then perhaps a test could be added to test for that.

        (I learnt about hl.maxAnalyzedChars as part of the London Lucene Hackday for Full Fact on Friday and so this ticket here today caught my eye and interest. hl.maxAnalyzedChars cropped up in the 'stacked tokens' team, this is our fork/readme file.)

        Show
        cpoerschke Christine Poerschke added a comment - If the intent is (and it might not be) that the new SolrHighlighter.DEFAULT_MAX_CHARS (51200) matches the value of existing (lucene) Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE (50*1024) then perhaps a test could be added to test for that. (I learnt about hl.maxAnalyzedChars as part of the London Lucene Hackday for Full Fact on Friday and so this ticket here today caught my eye and interest. hl.maxAnalyzedChars cropped up in the 'stacked tokens' team, this is our fork/readme file.)
        Hide
        dsmiley David Smiley added a comment -

        Yes; the intent is to match what Solr's original/default/standard highlighter does, which is indeed one in the same with Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE. I see you noticed I didn't add a test I'll go add one if you insist; though I don't think every option out there deserves a test, particularly at the Solr layer for where there is a Lucene capability.

        Show
        dsmiley David Smiley added a comment - Yes; the intent is to match what Solr's original/default/standard highlighter does, which is indeed one in the same with Highlighter.DEFAULT_MAX_CHARS_TO_ANALYZE . I see you noticed I didn't add a test I'll go add one if you insist; though I don't think every option out there deserves a test, particularly at the Solr layer for where there is a Lucene capability.
        Hide
        cpoerschke Christine Poerschke added a comment -

        Actually I didn't notice the absence of a test (I agree that not everything deserves one) but the The default is typically 51,200 but for ... it's 10,000. jumped out and made me wonder if in future there could be another "is typically ... but for ... it's 51,200" repeat.

        (Small test attached if you want it, but am definitely not insisting on one.)

        Show
        cpoerschke Christine Poerschke added a comment - Actually I didn't notice the absence of a test (I agree that not everything deserves one) but the The default is typically 51,200 but for ... it's 10,000. jumped out and made me wonder if in future there could be another "is typically ... but for ... it's 51,200" repeat. (Small test attached if you want it, but am definitely not insisting on one.)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f5c6c3b796ff6be59a9811f0f4f69cd6e8c0a3cd in lucene-solr's branch refs/heads/master from David Smiley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f5c6c3b ]

        SOLR-10018: default hl.maxAnalyazedChars to 51200 across all highlighters

        Show
        jira-bot ASF subversion and git services added a comment - Commit f5c6c3b796ff6be59a9811f0f4f69cd6e8c0a3cd in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f5c6c3b ] SOLR-10018 : default hl.maxAnalyazedChars to 51200 across all highlighters
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1c7ae87f0cbc3440d022fecfbb1f980bf244f4ce in lucene-solr's branch refs/heads/branch_6x from David Smiley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1c7ae87 ]

        SOLR-10018: default hl.maxAnalyazedChars to 51200 across all highlighters

        (cherry picked from commit f5c6c3b)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1c7ae87f0cbc3440d022fecfbb1f980bf244f4ce in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1c7ae87 ] SOLR-10018 : default hl.maxAnalyazedChars to 51200 across all highlighters (cherry picked from commit f5c6c3b)
        Hide
        dsmiley David Smiley added a comment -

        The problem is unlikely to happen again since the patch establishes a highlighter default in SolrHighlighter.java instead of referring to a constant at the Lucene layer per-highlighter impl.

        Show
        dsmiley David Smiley added a comment - The problem is unlikely to happen again since the patch establishes a highlighter default in SolrHighlighter.java instead of referring to a constant at the Lucene layer per-highlighter impl.

          People

          • Assignee:
            dsmiley David Smiley
            Reporter:
            dsmiley David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development