Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5503

Trivial fixes to WeightedSpanTermExtractor

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 4.7
    • Fix Version/s: 5.4
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The conversion of PhraseQuery to SpanNearQuery miscalculates the slop if there are stop words in some cases. The issue only really appears if there is more than one intervening run of stop words: ab the cd the the ef.

      I also noticed that the inOrder determination is based on the newly calculated slop, and it should probably be based on the original phraseQuery.getSlop()

      patch and unit tests on way

      1. LUCENE-5503.patch
        8 kB
        David Smiley
      2. LUCENE-5503.patch
        6 kB
        Tim Allison
      3. LUCENE-5503v2.patch
        7 kB
        Tim Allison

        Activity

        Hide
        dsmiley David Smiley added a comment -

        I'll take a look at this by next week.

        Show
        dsmiley David Smiley added a comment - I'll take a look at this by next week.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Updated patch that works with current trunk.

        Show
        tallison@mitre.org Tim Allison added a comment - Updated patch that works with current trunk.
        Hide
        dsmiley David Smiley added a comment -

        Looks good Tim! I like the tests. I made some minor improvements to the code, somewhat making it similar to the conversion that WSTE does of MultiPhraseQuery in terms of keeping the positionGaps integer separate from slop. And no loop is needed to calculate that gap.

        p.s. when attaching patches, use the same file name for updates. JIRA keeps all of them and clearly shows the latest.

        Show
        dsmiley David Smiley added a comment - Looks good Tim! I like the tests. I made some minor improvements to the code, somewhat making it similar to the conversion that WSTE does of MultiPhraseQuery in terms of keeping the positionGaps integer separate from slop. And no loop is needed to calculate that gap. p.s. when attaching patches, use the same file name for updates. JIRA keeps all of them and clearly shows the latest.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thank you, David!

        Show
        tallison@mitre.org Tim Allison added a comment - Thank you, David!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1702695 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1702695 ]

        LUCENE-5503: Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1702695 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1702695 ] LUCENE-5503 : Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1702697 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1702697 ]

        LUCENE-5503: Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1702697 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1702697 ] LUCENE-5503 : Highlighter WSTE didn't always convert a PhraseQuery to a SpanQuery correctly.

          People

          • Assignee:
            dsmiley David Smiley
            Reporter:
            tallison@mitre.org Tim Allison
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development