Lucene - Core
  1. Lucene - Core
  2. LUCENE-3120

span query matches too many docs when two query terms are the same unless inOrder=true

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      spinoff of user list discussion - SpanNearQuery - inOrder parameter.

      With 3 documents:

      • "a b x c d"
      • "a b b d"
      • "a b x b y d"

      Here are a few queries (the number in parenthesis indicates expected #hits):

      These ones work as expected:

      • (1) in-order, slop=0, "b", "x", "b"
      • (1) in-order, slop=0, "b", "b"
      • (2) in-order, slop=1, "b", "b"

      These ones match too many hits:

      • (1) any-order, slop=0, "b", "x", "b"
      • (1) any-order, slop=1, "b", "x", "b"
      • (1) any-order, slop=2, "b", "x", "b"
      • (1) any-order, slop=3, "b", "x", "b"

      These ones match too many hits as well:

      • (1) any-order, slop=0, "b", "b"
      • (2) any-order, slop=1, "b", "b"

      Each of the above passes when using a phrase query (applying the slop, no in-order indication in phrase query).

      This seems related to a known overlapping spans issue - non-overlapping Span queries - as indicated by Hoss, so we might decide to close this bug after all, but I would like to at least have the junit that exposes the behavior in JIRA.

      1. LUCENE-3120.patch
        4 kB
        Doron Cohen
      2. LUCENE-3120.patch
        4 kB
        Doron Cohen

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Doron Cohen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development