Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      SpanNearQuery is not quite an exact Spans replacement for PhraseQuery at the moment, because while you can ask for an overall slop in an ordered match, you can't specify exactly where the gaps should appear.

      1. LUCENE-6580.patch
        13 kB
        Alan Woodward
      2. LUCENE-6580.patch
        14 kB
        Alan Woodward
      3. LUCENE-6580.patch
        13 kB
        Alan Woodward
      4. LUCENE-6580.patch
        10 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Patch adding a SpanGapQuery that can be added as part of SpanNearQuery's constructor.

        This also adds a Spans.skipToPosition(int) method which GapSpans overrides to make seeking forward more efficient.

        One thing I don't like here is that SpanGapQuery is a top-level SpanQuery, when it only really makes sense to be used within SpanNearQuery. An alternative could be to add a builder to SpanNearQuery with an .addGap(int) method, and make SpanGapQuery a private class.

        Show
        Alan Woodward added a comment - Patch adding a SpanGapQuery that can be added as part of SpanNearQuery's constructor. This also adds a Spans.skipToPosition(int) method which GapSpans overrides to make seeking forward more efficient. One thing I don't like here is that SpanGapQuery is a top-level SpanQuery, when it only really makes sense to be used within SpanNearQuery. An alternative could be to add a builder to SpanNearQuery with an .addGap(int) method, and make SpanGapQuery a private class.
        Hide
        Robert Muir added a comment -

        Can we please not name any more methods skipToXXXX that are linear time?

        That is such a trap.

        Show
        Robert Muir added a comment - Can we please not name any more methods skipToXXXX that are linear time? That is such a trap.
        Hide
        Alan Woodward added a comment -

        Fair enough. How about advanceToPosition?

        Show
        Alan Woodward added a comment - Fair enough. How about advanceToPosition ?
        Hide
        Michael McCandless added a comment -

        scanToPosition?

        Show
        Michael McCandless added a comment - scanToPosition?
        Hide
        Alan Woodward added a comment -

        New patch:

        • skipToPosition is now scanToPosition()
        • SpanNearQuery has a Builder, and SpanGapQuery is a private subclass
        Show
        Alan Woodward added a comment - New patch: skipToPosition is now scanToPosition() SpanNearQuery has a Builder, and SpanGapQuery is a private subclass
        Hide
        Adrien Grand added a comment -

        When I read the issue description, I thought this would make SpanNearQuery more similar to PhraseQuery but it seems to work differently?

        For instance if you search for the "quick ? ? fox" phrase query with a slop of 2, a document containing "quick fox" will match thanks to the slop. However in this Spans implementation, it looks like the gap is required regardless of the slop since NearSpansOrdered ensures that the start position of a spans is greater than the position of the previous span?

        Show
        Adrien Grand added a comment - When I read the issue description, I thought this would make SpanNearQuery more similar to PhraseQuery but it seems to work differently? For instance if you search for the "quick ? ? fox" phrase query with a slop of 2, a document containing "quick fox" will match thanks to the slop. However in this Spans implementation, it looks like the gap is required regardless of the slop since NearSpansOrdered ensures that the start position of a spans is greater than the position of the previous span?
        Hide
        Alan Woodward added a comment -

        This is because slops mean different things between the two queries. In PhraseQuery, a slop of greater than 0 means we end up with a SloppyPhraseScorer that relaxes the ordering constraint (so you can have, in effect, the 'gap' appearing after the end of the match). An ordered SpanNearQuery with a slop, however, still requires its clauses to be in order, but allows them to be spaced out.

        So this issue makes an ordered SpanNearQuery more like a PhraseQuery only in the case that the PQ has defined gaps, but zero slop.

        Show
        Alan Woodward added a comment - This is because slops mean different things between the two queries. In PhraseQuery, a slop of greater than 0 means we end up with a SloppyPhraseScorer that relaxes the ordering constraint (so you can have, in effect, the 'gap' appearing after the end of the match). An ordered SpanNearQuery with a slop, however, still requires its clauses to be in order, but allows them to be spaced out. So this issue makes an ordered SpanNearQuery more like a PhraseQuery only in the case that the PQ has defined gaps, but zero slop.
        Hide
        Alan Woodward added a comment -

        Patch updated to trunk. I'd like to get this in for 5.3 if nobody objects.

        Show
        Alan Woodward added a comment - Patch updated to trunk. I'd like to get this in for 5.3 if nobody objects.
        Hide
        Adrien Grand added a comment -

        I think we should try not to add a new method to the Spans class. Could we instead keep it contained to SpanNear? For instance maybe we could have a static utility method that either calls next until the desired position is reached, or directly jumps if the Spans are an instance of GapSpans?

        Show
        Adrien Grand added a comment - I think we should try not to add a new method to the Spans class. Could we instead keep it contained to SpanNear? For instance maybe we could have a static utility method that either calls next until the desired position is reached, or directly jumps if the Spans are an instance of GapSpans?
        Hide
        Alan Woodward added a comment -

        Patch, taking into account Adrien's suggestion.

        Show
        Alan Woodward added a comment - Patch, taking into account Adrien's suggestion.
        Hide
        Adrien Grand added a comment -

        +1

        Show
        Adrien Grand added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1694082 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1694082 ]

        LUCENE-6580: Allow defined-width gaps in SpanNearQuery

        Show
        ASF subversion and git services added a comment - Commit 1694082 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1694082 ] LUCENE-6580 : Allow defined-width gaps in SpanNearQuery
        Hide
        ASF subversion and git services added a comment -

        Commit 1694086 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694086 ]

        LUCENE-6580: Allow defined-width gaps in SpanNearQuery

        Show
        ASF subversion and git services added a comment - Commit 1694086 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694086 ] LUCENE-6580 : Allow defined-width gaps in SpanNearQuery
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development