Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-393

Inconsistent scoring with SpanTermQuery in BooleanQuery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: Windows XP
      Platform: Other

    • Bugzilla Id:
      35157

      Description

      When a SpanTermQuery is added to a BooleanQuery, incorrect results are
      returned.

      I am running Lucene 1.9 RC1 on Windows XP. I have a test case which has
      several tests. It has an index with 4 identical documents in it.

      When two TermQuerys are used in a BooleanQuery, the score looks like this:
      4 hits for search: two term queries
      ID:1 (score:0.54932046)
      ID:2 (score:0.54932046)
      ID:3 (score:0.54932046)
      ID:4 (score:0.54932046)

      Notice how it is correctly setting the score to be the same for each document.

      When two SpanQuerys are used in a BooleanQuery, the score looks like this:
      2 hits for search: two span queries
      ID:1 (score:0.3884282)
      ID:4 (score:0.1942141)

      Notice how it only returned two documents instead of four. And the two it did
      return have differing scores.

      I believe that there is an error in the scoring algorithm that is making the
      other two documents not show up.

        Activity

        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        Created an attachment (id=15247)
        Test case that demonstrates the SpanTermQuery issue

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - Created an attachment (id=15247) Test case that demonstrates the SpanTermQuery issue
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        The test case also fails with Lucene 1.4.3 on Windows.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - The test case also fails with Lucene 1.4.3 on Windows.
        Hide
        paul.elschot@xs4all.nl Paul Elschot added a comment -

        Created an attachment (id=15251)
        The same test case using a RAMDirectory

        I think a SpanTermQuery is not intended to be used directly in a BooleanQuery,
        and only SpanNearQuery, SpanNotQuery or SpanFirstQuery should be used
        as subqueries of BooleanQuery.

        Nevertheless, with the current javadocs, I think one would expect this
        to work.

        The attachment is in package org.apache.lucene.search.spans and it's adapted
        to use a RAMDirectory, unchanged for the rest.

        Regards,
        Paul Elschot.

        Show
        paul.elschot@xs4all.nl Paul Elschot added a comment - Created an attachment (id=15251) The same test case using a RAMDirectory I think a SpanTermQuery is not intended to be used directly in a BooleanQuery, and only SpanNearQuery, SpanNotQuery or SpanFirstQuery should be used as subqueries of BooleanQuery. Nevertheless, with the current javadocs, I think one would expect this to work. The attachment is in package org.apache.lucene.search.spans and it's adapted to use a RAMDirectory, unchanged for the rest. Regards, Paul Elschot.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        I think I'm on the right track but could use some advice from someone who
        understands the ConjuctionScorer better than I.

        The ConjuctionScorer.doNext() method calls skipTo(lastDocId) which makes the
        SpanScorer skip to the last span. This means that the middle spans are
        skipped and so those spans are incorrectly omitted from the results.

        Does anyone know why this scorer is trying to skip to the last document? I
        presume its necessary but am not sure why.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - I think I'm on the right track but could use some advice from someone who understands the ConjuctionScorer better than I. The ConjuctionScorer.doNext() method calls skipTo(lastDocId) which makes the SpanScorer skip to the last span. This means that the middle spans are skipped and so those spans are incorrectly omitted from the results. Does anyone know why this scorer is trying to skip to the last document? I presume its necessary but am not sure why.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        (In reply to comment #4)
        > I think I'm on the right track but could use some advice from someone who
        > understands the ConjuctionScorer better than I.
        > The ConjuctionScorer.doNext() method calls skipTo(lastDocId) which makes the
        > SpanScorer skip to the last span. This means that the middle spans are
        > skipped and so those spans are incorrectly omitted from the results.
        > Does anyone know why this scorer is trying to skip to the last document? I
        > presume its necessary but am not sure why.

        Ignore this. I figured it out and am working on a fix.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - (In reply to comment #4) > I think I'm on the right track but could use some advice from someone who > understands the ConjuctionScorer better than I. > The ConjuctionScorer.doNext() method calls skipTo(lastDocId) which makes the > SpanScorer skip to the last span. This means that the middle spans are > skipped and so those spans are incorrectly omitted from the results. > Does anyone know why this scorer is trying to skip to the last document? I > presume its necessary but am not sure why. Ignore this. I figured it out and am working on a fix.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        Created an attachment (id=15273)
        Proposed patch to fix problem

        Here is the patch that fixes the problem.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - Created an attachment (id=15273) Proposed patch to fix problem Here is the patch that fixes the problem.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        Created an attachment (id=15274)
        Simple test case that demonstrates failure.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - Created an attachment (id=15274) Simple test case that demonstrates failure.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        Created an attachment (id=15275)
        Expanded test case to make sure other SpanTermQuery functionality isn't broken.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - Created an attachment (id=15275) Expanded test case to make sure other SpanTermQuery functionality isn't broken.
        Hide
        yahootintin-lucene@yahoo.com Reece (YT) added a comment -

        This patch stops spans from being skipped. A simple test in the skipTo method
        solves the problem.

        The first test case shows the problem and proves that this patch solves it.
        The second test case contains several other tests to see if this change
        affects other features of the SpanTermQuery. The tests that in here all pass
        with the patch.

        Show
        yahootintin-lucene@yahoo.com Reece (YT) added a comment - This patch stops spans from being skipped. A simple test in the skipTo method solves the problem. The first test case shows the problem and proves that this patch solves it. The second test case contains several other tests to see if this change affects other features of the SpanTermQuery. The tests that in here all pass with the patch.
        Hide
        jakarta@ehatchersolutions.com Erik Hatcher added a comment -

        Applied patch and tests. Thanks!

        Show
        jakarta@ehatchersolutions.com Erik Hatcher added a comment - Applied patch and tests. Thanks!

          People

          • Assignee:
            java-dev@lucene.apache.org Lucene Developers
            Reporter:
            yahootintin-lucene@yahoo.com Reece (YT)
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development