Lucene - Core
  1. Lucene - Core
  2. LUCENE-3229

SpanNearQuery: ordered spans should not overlap

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Windows XP, Java 1.6

    • Lucene Fields:
      New, Patch Available

      Description

      While using Span queries I think I've found a little bug.

      With a document like this (from the TestNearSpansOrdered unit test) :

      "w1 w2 w3 w4 w5"

      If I try to search for this span query :

      spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)

      the above document is returned and I think it should not because 'w4' is not after 'w5'.
      The 2 spans are not ordered, because there is an overlap.

      I will add a test patch in the TestNearSpansOrdered unit test.
      I will add a patch to solve this issue too.
      Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap.

      1. LUCENE-3229.patch
        7 kB
        Paul Elschot
      2. LUCENE-3229.patch
        7 kB
        Greg Dearing
      3. LUCENE-3229.patch
        6 kB
        Paul Elschot
      4. LUCENE-3229.patch
        4 kB
        ludovic Boutros
      5. SpanOverlap2.diff
        2 kB
        ludovic Boutros
      6. SpanOverlap.diff
        1 kB
        ludovic Boutros
      7. SpanOverlapTestUnit.diff
        1 kB
        ludovic Boutros

        Activity

        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.
        Hide
        Paul Elschot added a comment -

        The patch of 19 Oct 2013 adds a few more test cases to verify that there is a match in the overlapped non ordered case.

        Show
        Paul Elschot added a comment - The patch of 19 Oct 2013 adds a few more test cases to verify that there is a match in the overlapped non ordered case.
        Hide
        Paul Elschot added a comment -

        I'm a little uncertain about the workflow.

        That's not a bug, that's a feature

        Show
        Paul Elschot added a comment - I'm a little uncertain about the workflow. That's not a bug, that's a feature
        Hide
        Greg Dearing added a comment -

        I don't think Paul's questions weren't directed at me, but to add my thoughts:

        I also don't see a need for backward compatibility. This feels like a bug fix, though I suppose that could be argued. If there ends up being a strong desire for compatibility, maybe I could add a new constructor:

        public SpanNearQuery(SpanQuery[] clauses, int slop, boolean inOrder, boolean collectPayloads, boolean allowOverlap)
        

        This would enable the original behavior and also add the unordered/not-overlapping case. Neither feels especially useful.

        Also, I'm a little uncertain about the workflow. Your test change seems reasonable to me; would you like me to generate a new patch incorporating it?

        Show
        Greg Dearing added a comment - I don't think Paul's questions weren't directed at me, but to add my thoughts: I also don't see a need for backward compatibility. This feels like a bug fix, though I suppose that could be argued. If there ends up being a strong desire for compatibility, maybe I could add a new constructor: public SpanNearQuery(SpanQuery[] clauses, int slop, boolean inOrder, boolean collectPayloads, boolean allowOverlap) This would enable the original behavior and also add the unordered/not-overlapping case. Neither feels especially useful. Also, I'm a little uncertain about the workflow. Your test change seems reasonable to me; would you like me to generate a new patch incorporating it?
        Hide
        Paul Elschot added a comment -

        The patch of 16 Oct 2013 works for me.

        Is there any reason for some of methods in TestNearSpansOrdered to be protected?

        I made the following change to try and make it a bit more involved.
        The subquery is now not ordered, and its arguments have been reversed:

          protected SpanNearQuery makeOverlappedQuery() {
            return new SpanNearQuery(
              new SpanQuery[] {
                new SpanNearQuery(new SpanQuery[] {
                  new SpanTermQuery(new Term(FIELD, "w5")),
                    new SpanTermQuery(new Term(FIELD, "w3")) },
                    1,
                    false
                  ),
                  new SpanTermQuery(new Term(FIELD, "w4")) },
                  0,
                  true);
          }
        

        The only thing I can think of that might be in the way is that this changes existing behaviour, and it does not provide any way to have the previous behaviour.
        Using unordered spans does allow overlaps, so that does not have to be real problem, and it isn't for me.

        Any objections because of the lack of backward compatibility?

        Show
        Paul Elschot added a comment - The patch of 16 Oct 2013 works for me. Is there any reason for some of methods in TestNearSpansOrdered to be protected? I made the following change to try and make it a bit more involved. The subquery is now not ordered, and its arguments have been reversed: protected SpanNearQuery makeOverlappedQuery() { return new SpanNearQuery( new SpanQuery[] { new SpanNearQuery( new SpanQuery[] { new SpanTermQuery( new Term(FIELD, "w5" )), new SpanTermQuery( new Term(FIELD, "w3" )) }, 1, false ), new SpanTermQuery( new Term(FIELD, "w4" )) }, 0, true ); } The only thing I can think of that might be in the way is that this changes existing behaviour, and it does not provide any way to have the previous behaviour. Using unordered spans does allow overlaps, so that does not have to be real problem, and it isn't for me. Any objections because of the lack of backward compatibility?
        Hide
        Greg Dearing added a comment -

        Have been successfully using the provided fix since 3.6. (Thanks ludovic and Paul).

        Since the original patches haven't applied cleanly for a while, I thought I'd upload the version we've been using. It preserves the logic from Paul Elschot's version, except re-based against version 4.5 and with some minor javadoc fixes.

        If there's anything else that would be helpful, please comment and I'll see what I can do.

        Show
        Greg Dearing added a comment - Have been successfully using the provided fix since 3.6. (Thanks ludovic and Paul). Since the original patches haven't applied cleanly for a while, I thought I'd upload the version we've been using. It preserves the logic from Paul Elschot's version, except re-based against version 4.5 and with some minor javadoc fixes. If there's anything else that would be helpful, please comment and I'll see what I can do.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Hoss Man added a comment -

        Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19.

        Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited

        Show
        Hoss Man added a comment - Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19. Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited
        Hide
        Paul Elschot added a comment -

        Try and set 3.4 as fix version for this, 3.3 is already on the way out.
        It might also help to add some text for a changes.txt entry.

        Show
        Paul Elschot added a comment - Try and set 3.4 as fix version for this, 3.3 is already on the way out. It might also help to add some text for a changes.txt entry.
        Hide
        ludovic Boutros added a comment -

        Thanks Paul,

        do you have any idea when this patch will be applied to the branch 3x ?

        Show
        ludovic Boutros added a comment - Thanks Paul, do you have any idea when this patch will be applied to the branch 3x ?
        Hide
        Paul Elschot added a comment -

        Thanks for bringing this up, this has confused more people in the past, and that could well be over now.

        Show
        Paul Elschot added a comment - Thanks for bringing this up, this has confused more people in the past, and that could well be over now.
        Hide
        Paul Elschot added a comment -

        Basically the same functionality as previous patch by Ludovic Boutros.
        Simplified the check for non overlapping spans, this might speed it up somewhat.
        Added javadoc explanations on ordered without overlap and unordered with overlap.
        Minor spelling and indentation changes.

        NearSpansOrdered might be further simplified as not all locals are actually used now because of the simplified check, but for now I prefer to leave that to the JIT to optimize away.

        Show
        Paul Elschot added a comment - Basically the same functionality as previous patch by Ludovic Boutros. Simplified the check for non overlapping spans, this might speed it up somewhat. Added javadoc explanations on ordered without overlap and unordered with overlap. Minor spelling and indentation changes. NearSpansOrdered might be further simplified as not all locals are actually used now because of the simplified check, but for now I prefer to leave that to the JIT to optimize away.
        Hide
        ludovic Boutros added a comment -

        Here is the patch as described in the wiki.
        Is it ok ?

        Show
        ludovic Boutros added a comment - Here is the patch as described in the wiki. Is it ok ?
        Hide
        ludovic Boutros added a comment -

        :To reduce surprises like this one when nested spans are used, the ordered case might be changed to require no overlap at all.
        :To do that one could compare the end of one spans with the beginning of the next one.
        :AFAIK none of the existing test cases uses a nested span query, so more some test cases for that would be good to have.

        The patch does exactly that.

        :The docSpansOrdered method in NearSpansUnordered from the SpanOverLap2.diff patch
        :is the same as the existing docSpansOrdered method in NearSpansOrdered.
        :That is probably not intended.

        It is the same as the actual method because I don't want to modify the current behavior of the NearSpansUnordered class.
        Overlap should be allowed for unordered near span queries. And if I do not do that, unit test is KO for unordered near span queries.

        :Could you provide patches as decribed here: http://wiki.apache.org/lucene-java/HowToContribute ?

        Sorry for that, sure, I will provide the patch shortly.

        Show
        ludovic Boutros added a comment - :To reduce surprises like this one when nested spans are used, the ordered case might be changed to require no overlap at all. :To do that one could compare the end of one spans with the beginning of the next one. :AFAIK none of the existing test cases uses a nested span query, so more some test cases for that would be good to have. The patch does exactly that. :The docSpansOrdered method in NearSpansUnordered from the SpanOverLap2.diff patch :is the same as the existing docSpansOrdered method in NearSpansOrdered. :That is probably not intended. It is the same as the actual method because I don't want to modify the current behavior of the NearSpansUnordered class. Overlap should be allowed for unordered near span queries. And if I do not do that, unit test is KO for unordered near span queries. :Could you provide patches as decribed here: http://wiki.apache.org/lucene-java/HowToContribute ? Sorry for that, sure, I will provide the patch shortly.
        Hide
        Paul Elschot added a comment -

        To reduce surprises like this one when nested spans are used, the ordered case might be changed to require no overlap at all.
        To do that one could compare the end of one spans with the beginning of the next one.

        AFAIK none of the existing test cases uses a nested span query, so more some test cases for that would be good to have.

        The docSpansOrdered method in NearSpansUnordered from the SpanOverLap2.diff patch
        is the same as the existing docSpansOrdered method in NearSpansOrdered.
        That is probably not intended.

        Could you provide patches as decribed here: http://wiki.apache.org/lucene-java/HowToContribute ?

        Show
        Paul Elschot added a comment - To reduce surprises like this one when nested spans are used, the ordered case might be changed to require no overlap at all. To do that one could compare the end of one spans with the beginning of the next one. AFAIK none of the existing test cases uses a nested span query, so more some test cases for that would be good to have. The docSpansOrdered method in NearSpansUnordered from the SpanOverLap2.diff patch is the same as the existing docSpansOrdered method in NearSpansOrdered. That is probably not intended. Could you provide patches as decribed here: http://wiki.apache.org/lucene-java/HowToContribute ?
        Hide
        ludovic Boutros added a comment -

        add a patch for the "SpanNearUnOrdered" class. Everything should be ok now.

        Show
        ludovic Boutros added a comment - add a patch for the "SpanNearUnOrdered" class. Everything should be ok now.
        Hide
        ludovic Boutros added a comment - - edited

        testSpanNearUnOrdered unit test does not work anymore.

        The unordered SpanNear class uses the ordering function of the ordered SpanNear class. Perhaps, it should use its own ordering function which allows the span overlaps.
        I will check.

        Show
        ludovic Boutros added a comment - - edited testSpanNearUnOrdered unit test does not work anymore. The unordered SpanNear class uses the ordering function of the ordered SpanNear class. Perhaps, it should use its own ordering function which allows the span overlaps. I will check.
        Hide
        ludovic Boutros added a comment -

        add a Patch.

        Show
        ludovic Boutros added a comment - add a Patch.
        Hide
        ludovic Boutros added a comment -

        Add the Test unit.

        Show
        ludovic Boutros added a comment - Add the Test unit.

          People

          • Assignee:
            Unassigned
            Reporter:
            ludovic Boutros
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development