Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      SpanScorer and Spans currently share the burden of scoring span queries, with SpanScorer delegating to Spans for most operations. Spans is essentially a Scorer, just with the ability to iterate through positions as well, and no SimScorer to use for scoring. This seems overly complicated. We should merge the two classes into one.

      1. LUCENE-6845_norenames.patch
        55 kB
        Alan Woodward
      2. LUCENE-6845_norenames.patch
        116 kB
        Alan Woodward
      3. LUCENE-6845_norenames.patch.txt
        114 kB
        Alan Woodward
      4. LUCENE-6845.patch
        178 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Patch.

        Spans is merged entirely into SpanScorer, various classes get renamed, and SpanWeight.getSpans() becomes SpanWeight.getSpanScorer(). The only slightly messy part concerns how to pass SimScorers around, as SpanScorer now takes one in its constructor.

        Span tests pass, but I still need to run the whole test suite, so health warnings apply.

        Show
        Alan Woodward added a comment - Patch. Spans is merged entirely into SpanScorer, various classes get renamed, and SpanWeight.getSpans() becomes SpanWeight.getSpanScorer(). The only slightly messy part concerns how to pass SimScorers around, as SpanScorer now takes one in its constructor. Span tests pass, but I still need to run the whole test suite, so health warnings apply.
        Hide
        David Smiley added a comment -

        Sounds great. But as I started to look at the patch; it became clear that it's hard to review with the renames to have the Scorer Suffix. Perhaps, temporarily, you could not do that to make it more reviewable?

        I'm anticipating I'm going to see stuff that suggests this should wait for 6.0. What do you think?

        Show
        David Smiley added a comment - Sounds great. But as I started to look at the patch; it became clear that it's hard to review with the renames to have the Scorer Suffix. Perhaps, temporarily, you could not do that to make it more reviewable? I'm anticipating I'm going to see stuff that suggests this should wait for 6.0. What do you think?
        Hide
        Alan Woodward added a comment -

        Patch without the renames, which should make changes easier to see.

        I don't think there's anything in here that needs to wait for 6.0? People creating their own Spans implementations are highly-expert users, and shouldn't have a problem with changing things to extend SpanScorer instead.

        Show
        Alan Woodward added a comment - Patch without the renames, which should make changes easier to see. I don't think there's anything in here that needs to wait for 6.0? People creating their own Spans implementations are highly-expert users, and shouldn't have a problem with changing things to extend SpanScorer instead.
        Hide
        David Smiley added a comment -

        I looked it over; I like the conceptual simplification this bring. One thing confused me though: What is the purpose of SpanNotQuery’s getSpanScorer wrapping the includeSpans with a FilterSpans that has an accept that always returns YES? I see this in SpanOrQuery too and perhaps others.

        The rename of Spans -> SpanScorer affects anyone consuming Spans, not just implementers of custom Spans. This is seen in your patch in WeightedSpanTermExtractor, for example. I'm on the fence if this is acceptable or not in a point release; I wonder what others think. Even if we decide to keep backward compatibility, I think the substance of your patch could still land in 5x by having Spans not be renamed to SpanScorer, and keep the existing method names as-is.

        Show
        David Smiley added a comment - I looked it over; I like the conceptual simplification this bring. One thing confused me though: What is the purpose of SpanNotQuery’s getSpanScorer wrapping the includeSpans with a FilterSpans that has an accept that always returns YES? I see this in SpanOrQuery too and perhaps others. The rename of Spans -> SpanScorer affects anyone consuming Spans, not just implementers of custom Spans. This is seen in your patch in WeightedSpanTermExtractor, for example. I'm on the fence if this is acceptable or not in a point release; I wonder what others think. Even if we decide to keep backward compatibility, I think the substance of your patch could still land in 5x by having Spans not be renamed to SpanScorer, and keep the existing method names as-is.
        Hide
        Alan Woodward added a comment -

        What is the purpose of SpanNotQuery’s getSpanScorer wrapping the includeSpans with a FilterSpans that has an accept that always returns YES?

        This is a slightly hacky workaround for the fact that only the top-level Spans implementation actually does any scoring. This means that degenerate cases like single-clause SpanOrQueries or SpanNotQueries with null exclusion clauses can't just return the relevant child Spans, because it won't have a SimScorer set on it. Instead, we need to wrap it with a Spans containing a SimScorer. This is an abuse of FilterSpans though - I'll create a specialised ScoringWrapperSpans instead to use here.

        If we're really worried about backwards compatibility, we could keep Spans as an empty subclass of SpanScorer. But I really don't think it's necessary - we already changed the Spans interface in 5.3 by adding Spans.collect(), so users upgrading from 5.2 from 5.3 would have had to recompile.

        Show
        Alan Woodward added a comment - What is the purpose of SpanNotQuery’s getSpanScorer wrapping the includeSpans with a FilterSpans that has an accept that always returns YES? This is a slightly hacky workaround for the fact that only the top-level Spans implementation actually does any scoring. This means that degenerate cases like single-clause SpanOrQueries or SpanNotQueries with null exclusion clauses can't just return the relevant child Spans, because it won't have a SimScorer set on it. Instead, we need to wrap it with a Spans containing a SimScorer. This is an abuse of FilterSpans though - I'll create a specialised ScoringWrapperSpans instead to use here. If we're really worried about backwards compatibility, we could keep Spans as an empty subclass of SpanScorer. But I really don't think it's necessary - we already changed the Spans interface in 5.3 by adding Spans.collect(), so users upgrading from 5.2 from 5.3 would have had to recompile.
        Hide
        Alan Woodward added a comment -

        Updated patch, with ScoringWrapperSpans. I also renamed SpanWeight.getSpanScorer() to SpanWeight.spanScorer(), by analogy with scorer() and bulkScorer().

        All tests and precommit pass.

        Show
        Alan Woodward added a comment - Updated patch, with ScoringWrapperSpans. I also renamed SpanWeight.getSpanScorer() to SpanWeight.spanScorer(), by analogy with scorer() and bulkScorer(). All tests and precommit pass.
        Hide
        David Smiley added a comment -

        we already changed the Spans interface in 5.3 by adding Spans.collect(), so users upgrading from 5.2 from 5.3 would have had to recompile

        I don't see why recompiling would be needed if you merely consume a Spans (don't have a custom implementation) and don't call the newly added method.

        Any way, I'm +1 to all this.

        Show
        David Smiley added a comment - we already changed the Spans interface in 5.3 by adding Spans.collect(), so users upgrading from 5.2 from 5.3 would have had to recompile I don't see why recompiling would be needed if you merely consume a Spans (don't have a custom implementation) and don't call the newly added method. Any way, I'm +1 to all this.
        Hide
        Alan Woodward added a comment -

        Ah, I see what you mean now.

        I've changed the patch to be as least invasive as I can make it - SpanScorer is merged into Spans, rather than the other way around, and SpanWeight.getSpans() stays the same.

        Show
        Alan Woodward added a comment - Ah, I see what you mean now. I've changed the patch to be as least invasive as I can make it - SpanScorer is merged into Spans, rather than the other way around, and SpanWeight.getSpans() stays the same.
        Hide
        David Smiley added a comment -

        And the patch is now half the size (thus low-impact on consumers). I do very much like your rename proposals; it's just that those should be on trunk-only, I think. Perhaps in a follow-up commit that doesn't get back-ported.

        Show
        David Smiley added a comment - And the patch is now half the size (thus low-impact on consumers). I do very much like your rename proposals; it's just that those should be on trunk-only, I think. Perhaps in a follow-up commit that doesn't get back-ported.
        Hide
        ASF subversion and git services added a comment -

        Commit 1709964 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1709964 ]

        LUCENE-6845: Merge SpanScorer into Spans

        Show
        ASF subversion and git services added a comment - Commit 1709964 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1709964 ] LUCENE-6845 : Merge SpanScorer into Spans
        Hide
        ASF subversion and git services added a comment -

        Commit 1709993 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1709993 ]

        LUCENE-6845: Merge SpanScorer into Spans

        Show
        ASF subversion and git services added a comment - Commit 1709993 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1709993 ] LUCENE-6845 : Merge SpanScorer into Spans

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development