Lucene - Core
  1. Lucene - Core
  2. LUCENE-5503

Trivial fixes to WeightedSpanTermExtractor

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.7
    • Fix Version/s: 5.4
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The conversion of PhraseQuery to SpanNearQuery miscalculates the slop if there are stop words in some cases. The issue only really appears if there is more than one intervening run of stop words: ab the cd the the ef.

      I also noticed that the inOrder determination is based on the newly calculated slop, and it should probably be based on the original phraseQuery.getSlop()

      patch and unit tests on way

      1. LUCENE-5503.patch
        8 kB
        David Smiley
      2. LUCENE-5503.patch
        6 kB
        Tim Allison
      3. LUCENE-5503v2.patch
        7 kB
        Tim Allison

        Activity

        Hide
        David Smiley added a comment -

        I'll take a look at this by next week.

        Show
        David Smiley added a comment - I'll take a look at this by next week.
        Hide
        Tim Allison added a comment -

        Updated patch that works with current trunk.

        Show
        Tim Allison added a comment - Updated patch that works with current trunk.
        Hide
        David Smiley added a comment -

        Looks good Tim! I like the tests. I made some minor improvements to the code, somewhat making it similar to the conversion that WSTE does of MultiPhraseQuery in terms of keeping the positionGaps integer separate from slop. And no loop is needed to calculate that gap.

        p.s. when attaching patches, use the same file name for updates. JIRA keeps all of them and clearly shows the latest.

        Show
        David Smiley added a comment - Looks good Tim! I like the tests. I made some minor improvements to the code, somewhat making it similar to the conversion that WSTE does of MultiPhraseQuery in terms of keeping the positionGaps integer separate from slop. And no loop is needed to calculate that gap. p.s. when attaching patches, use the same file name for updates. JIRA keeps all of them and clearly shows the latest.
        Hide
        Tim Allison added a comment -

        Thank you, David!

        Show
        Tim Allison added a comment - Thank you, David!
        Hide
        ASF subversion and git services added a comment -

        Commit 1702695 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1702695 ]

        LUCENE-5503: Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

        Show
        ASF subversion and git services added a comment - Commit 1702695 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1702695 ] LUCENE-5503 : Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1702697 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1702697 ]

        LUCENE-5503: Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

        Show
        ASF subversion and git services added a comment - Commit 1702697 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1702697 ] LUCENE-5503 : Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

          People

          • Assignee:
            David Smiley
            Reporter:
            Tim Allison
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development