Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9935

When hl.method=unified add support for hl.fragsize param

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4
    • Component/s: highlighter
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      In LUCENE-7620 the UnifiedHighlighter is getting a BreakIterator that allows it to support the equivalent of Solr's hl.fragsize. So lets support this on the Solr side.

      1. SOLR_9935_UH_fragsize.patch
        6 kB
        David Smiley
      2. SOLR_9935_UH_fragsize.patch
        5 kB
        David Smiley

        Issue Links

          Activity

          Hide
          dsmiley David Smiley added a comment -

          Here's a patch. The default fragsize chosen is 70 as that is the same used when the regex fragmenter (of the original Highlighter) is used in Solr. These are both similar in that you typically want to shoot for a passage about a sentence in length.

          Note the regex fragmenter has a "slop" feature that is 60% of the fragsize... this is not (yet) supported by the UH's LengthGoalBreakIterator.

          When LUCENE-7620 lands (this weekend?), I plan to commit this immediately after.

          Show
          dsmiley David Smiley added a comment - Here's a patch. The default fragsize chosen is 70 as that is the same used when the regex fragmenter (of the original Highlighter) is used in Solr. These are both similar in that you typically want to shoot for a passage about a sentence in length. Note the regex fragmenter has a "slop" feature that is 60% of the fragsize... this is not (yet) supported by the UH's LengthGoalBreakIterator. When LUCENE-7620 lands (this weekend?), I plan to commit this immediately after.
          Hide
          dsmiley David Smiley added a comment -

          Updated patch to account for API change in LUCENE-7620. Clarified the test a bit and some other related test methods. I'll commit later today. In CHANGES.txt I'll remove the note about UH not supporting hl.fragsize (yay).

          Features in the original highlighter that are not in the UH (as seen through Solr) are:

          • influence passage scoring from boosts in the query
          • hl.mergeContiguous defaults to false. In the UH, DefaultPassageFormatter always merges contiguous passages.
          • hl.alternateField and related options
          • hl.maxMultiValueToExamine (a performance circuit-breaker). Doesn't seem as pertinent to the UH as the original Highlighter.
          • regex based Passage delineation option
          • hl.preserveMulti the original Highlighter supports "true" (false by default) but the UH doesn't do this.
          Show
          dsmiley David Smiley added a comment - Updated patch to account for API change in LUCENE-7620 . Clarified the test a bit and some other related test methods. I'll commit later today. In CHANGES.txt I'll remove the note about UH not supporting hl.fragsize (yay). Features in the original highlighter that are not in the UH (as seen through Solr) are: influence passage scoring from boosts in the query hl.mergeContiguous defaults to false. In the UH, DefaultPassageFormatter always merges contiguous passages. hl.alternateField and related options hl.maxMultiValueToExamine (a performance circuit-breaker). Doesn't seem as pertinent to the UH as the original Highlighter. regex based Passage delineation option hl.preserveMulti the original Highlighter supports "true" (false by default) but the UH doesn't do this.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 570880d3acb45c925e8dc77172e0725ab8ba07b8 in lucene-solr's branch refs/heads/master from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=570880d ]

          SOLR-9935: Add hl.fragsize support when using the UnifiedHighlighter

          Show
          jira-bot ASF subversion and git services added a comment - Commit 570880d3acb45c925e8dc77172e0725ab8ba07b8 in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=570880d ] SOLR-9935 : Add hl.fragsize support when using the UnifiedHighlighter
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d195c2525b00ef6e12b88f838167475feb5d2d19 in lucene-solr's branch refs/heads/branch_6x from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d195c25 ]

          SOLR-9935: Add hl.fragsize support when using the UnifiedHighlighter

          (cherry picked from commit 570880d)

          Show
          jira-bot ASF subversion and git services added a comment - Commit d195c2525b00ef6e12b88f838167475feb5d2d19 in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d195c25 ] SOLR-9935 : Add hl.fragsize support when using the UnifiedHighlighter (cherry picked from commit 570880d)
          Hide
          dsmiley David Smiley added a comment -

          While documenting the highlighters in the Solr Ref Guide, I overlooked that hl.fragsize of 0 is a special value to mean don't to any fragmenting. I should add this as a special case to use the WholeBreakIterator. Jim Ferenczi is it too late for 6.4?

          Show
          dsmiley David Smiley added a comment - While documenting the highlighters in the Solr Ref Guide, I overlooked that hl.fragsize of 0 is a special value to mean don't to any fragmenting. I should add this as a special case to use the WholeBreakIterator. Jim Ferenczi is it too late for 6.4?
          Hide
          jim.ferenczi Jim Ferenczi added a comment -

          I guess you can David. I'll create the first RC later today but if you feel that this change is safe you can push it now.

          Show
          jim.ferenczi Jim Ferenczi added a comment - I guess you can David. I'll create the first RC later today but if you feel that this change is safe you can push it now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ed513fdee77b95379bed8f8d5f369fb0393fd364 in lucene-solr's branch refs/heads/master from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed513fd ]

          SOLR-9935: UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting

          Show
          jira-bot ASF subversion and git services added a comment - Commit ed513fdee77b95379bed8f8d5f369fb0393fd364 in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ed513fd ] SOLR-9935 : UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9224065cb2e07b8918041ebac8795bddfba71ac6 in lucene-solr's branch refs/heads/branch_6x from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9224065 ]

          SOLR-9935: UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting

          (cherry picked from commit ed513fd)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9224065cb2e07b8918041ebac8795bddfba71ac6 in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9224065 ] SOLR-9935 : UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting (cherry picked from commit ed513fd)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 in lucene-solr's branch refs/heads/branch_6_4 from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bbe4b08 ]

          SOLR-9935: UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting

          (cherry picked from commit 9224065)

          Show
          jira-bot ASF subversion and git services added a comment - Commit bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 in lucene-solr's branch refs/heads/branch_6_4 from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bbe4b08 ] SOLR-9935 : UnifiedHighlighter, when hl.fragsize=0 don't do fragmenting (cherry picked from commit 9224065)

            People

            • Assignee:
              dsmiley David Smiley
              Reporter:
              dsmiley David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development