Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This query is actually a lot like SpanPositionCheckQuery, except it checks that each inclusion Spans does not come near the exclusion side.

      Two-phase iteration should just work the inclusion side, deferring positions (the overlap checking against exclusion) until necessary.

      1. LUCENE-6394.patch
        21 kB
        Robert Muir
      2. LUCENE-6394.patch
        21 kB
        Robert Muir
      3. LUCENE-6394.patch
        20 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Here is a patch. I think it simplifies SpanNot a great deal.

        Basically we fold more logic from SpanPositionCheck into FilterSpans, and SpanNot uses that too.

        Most methods in FilterSpans (nextDoc, nextStartPosition, two-phase support, etc) become final methods, it only has one abstract method:

        /**
         * A {@link Spans} implementation wrapping another spans instance,
         * allowing to filter spans matches easily by implementing {@link #accept}
         */
        public abstract class FilterSpans extends Spans {
        ...
          /** 
           * Returns YES if the candidate should be an accepted match,
           * NO if it should not, and NO_MORE_IN_CURRENT_DOC if iteration
           * should move on to the next document.
           */
          protected abstract AcceptStatus accept(Spans candidate) throws IOException;
        
        Show
        Robert Muir added a comment - Here is a patch. I think it simplifies SpanNot a great deal. Basically we fold more logic from SpanPositionCheck into FilterSpans, and SpanNot uses that too. Most methods in FilterSpans (nextDoc, nextStartPosition, two-phase support, etc) become final methods, it only has one abstract method: /** * A {@link Spans} implementation wrapping another spans instance, * allowing to filter spans matches easily by implementing {@link #accept} */ public abstract class FilterSpans extends Spans { ... /** * Returns YES if the candidate should be an accepted match, * NO if it should not, and NO_MORE_IN_CURRENT_DOC if iteration * should move on to the next document. */ protected abstract AcceptStatus accept(Spans candidate) throws IOException;
        Hide
        Robert Muir added a comment -

        Updated patch. its a little better, when the 'exclude' part has an approximation, we will use it.

        I need to refactor the logic again though, currently it uses a hack which is not great for its "advanceExact" logic:

        if (excludeApproximation.advance(candidate.docID()) != candidate.docID()) {
          return AcceptStatus.YES;
        } else if (!excludeTwoPhase.matches()) {
          excludeApproximation.nextDoc(); // move off this doc, we don't match.
          return AcceptStatus.YES;
        }
        

        We should just track an int or something instead, moving off the document with nextDoc() is not as efficient (we might never need it).

        Show
        Robert Muir added a comment - Updated patch. its a little better, when the 'exclude' part has an approximation, we will use it. I need to refactor the logic again though, currently it uses a hack which is not great for its "advanceExact" logic: if (excludeApproximation.advance(candidate.docID()) != candidate.docID()) { return AcceptStatus.YES; } else if (!excludeTwoPhase.matches()) { excludeApproximation.nextDoc(); // move off this doc, we don't match. return AcceptStatus.YES; } We should just track an int or something instead, moving off the document with nextDoc() is not as efficient (we might never need it).
        Hide
        Robert Muir added a comment -

        Updated patch removing the hack, and refactoring the logic. I thought it would be trickier...

        Show
        Robert Muir added a comment - Updated patch removing the hack, and refactoring the logic. I thought it would be trickier...
        Hide
        Adrien Grand added a comment -

        I like the patch, two minor things that got me wondering while reading the patch:

        • Should SpansCell still extend FilterSpans and return always YES? This would avoid reimplementing Spans from scratch?
        • It took me some time to compile how FilterSpans.twoPhaseCurrentDocMatches works, with its infinite loop, conditional fallthrough, etc. maybe it could be written in a more straightforward way?
        Show
        Adrien Grand added a comment - I like the patch, two minor things that got me wondering while reading the patch: Should SpansCell still extend FilterSpans and return always YES? This would avoid reimplementing Spans from scratch? It took me some time to compile how FilterSpans.twoPhaseCurrentDocMatches works, with its infinite loop, conditional fallthrough, etc. maybe it could be written in a more straightforward way?
        Hide
        Robert Muir added a comment -

        Should SpansCell still extend FilterSpans and return always YES? This would avoid reimplementing Spans from scratch?

        Definitely not. Its not really filtering the spans at all. it is just a wrapper. if we want a class for wrapping lets do that separately.

        It took me some time to compile how FilterSpans.twoPhaseCurrentDocMatches works, with its infinite loop, conditional fallthrough, etc. maybe it could be written in a more straightforward way?

        This code is moved verbatim from SpanPositionCheck.PositionCheckSpan here, so it can be reused instead of duplicated. Can we refactor it in a separate issue?

        My approach here is just removing complex duplicate code that all does the same thing. I realize its exciting that spans are now "unblocked" after paul's big rework , but we need to tackle things incrementally.

        Show
        Robert Muir added a comment - Should SpansCell still extend FilterSpans and return always YES? This would avoid reimplementing Spans from scratch? Definitely not. Its not really filtering the spans at all. it is just a wrapper. if we want a class for wrapping lets do that separately. It took me some time to compile how FilterSpans.twoPhaseCurrentDocMatches works, with its infinite loop, conditional fallthrough, etc. maybe it could be written in a more straightforward way? This code is moved verbatim from SpanPositionCheck.PositionCheckSpan here, so it can be reused instead of duplicated. Can we refactor it in a separate issue? My approach here is just removing complex duplicate code that all does the same thing. I realize its exciting that spans are now "unblocked" after paul's big rework , but we need to tackle things incrementally.
        Hide
        Adrien Grand added a comment -

        Can we refactor it in a separate issue?

        I have no objections.

        Show
        Adrien Grand added a comment - Can we refactor it in a separate issue? I have no objections.
        Hide
        Michael McCandless added a comment -

        +1, I like how SpanNotQuery just shares FilterSpans now.

        Show
        Michael McCandless added a comment - +1, I like how SpanNotQuery just shares FilterSpans now.
        Hide
        Paul Elschot added a comment - - edited

        This nicely allows to share code between SpanPositionCheck and SpanNot, leaving the SpanNot here simpler than the one I put at LUCENE-6373, so I'd rather have this one.

        Show
        Paul Elschot added a comment - - edited This nicely allows to share code between SpanPositionCheck and SpanNot, leaving the SpanNot here simpler than the one I put at LUCENE-6373 , so I'd rather have this one.
        Hide
        Robert Muir added a comment -

        Thanks for looking Paul.

        Actually those imports are used. Its because AcceptStatus is on FilterSpans now, so those subclasses needed to import it.

        Show
        Robert Muir added a comment - Thanks for looking Paul. Actually those imports are used. Its because AcceptStatus is on FilterSpans now, so those subclasses needed to import it.
        Hide
        Paul Elschot added a comment -

        Sorry, my correction crossed your answer.

        Show
        Paul Elschot added a comment - Sorry, my correction crossed your answer.
        Hide
        Robert Muir added a comment -

        I found an unrelated bug in SpanNear to this patch when testing, LUCENE-6418

        Show
        Robert Muir added a comment - I found an unrelated bug in SpanNear to this patch when testing, LUCENE-6418
        Hide
        ASF subversion and git services added a comment -

        Commit 1673016 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1673016 ]

        LUCENE-6394: Add two-phase support to SpanNotQuery

        Show
        ASF subversion and git services added a comment - Commit 1673016 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1673016 ] LUCENE-6394 : Add two-phase support to SpanNotQuery
        Hide
        ASF subversion and git services added a comment -

        Commit 1673021 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1673021 ]

        LUCENE-6394: Add two-phase support to SpanNotQuery

        Show
        ASF subversion and git services added a comment - Commit 1673021 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1673021 ] LUCENE-6394 : Add two-phase support to SpanNotQuery
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development