Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      SpanQuery.getSpans() should only be called on rewritten queries, so it seems to make more sense to have this being called from SpanWeight

      1. LUCENE-6466.patch
        94 kB
        Alan Woodward
      2. LUCENE-6466.patch
        92 kB
        Alan Woodward
      3. LUCENE-6466.patch
        93 kB
        Alan Woodward
      4. LUCENE-6466.patch
        89 kB
        Alan Woodward
      5. LUCENE-6466.patch
        81 kB
        Alan Woodward
      6. LUCENE-6466-2.patch
        39 kB
        Alan Woodward
      7. LUCENE-6466-2.patch
        39 kB
        Alan Woodward
      8. LUCENE-6466-2.patch
        37 kB
        Alan Woodward
      9. LUCENE-6466-branch5x.patch
        92 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Here is a patch. SpanQuery.createWeight() is now final, and delegates on to an abstract method #createSpanWeight(). This takes an additional boolean parameter to determine whether or not the weight is at the top of a span tree, in which case it should build term stats.

        Ideally I'd have liked to move SpanQuery.extractTerms() to SpanWeight as well, but that causes complications when collecting terms in the SpanWeight constructor (basically you can't call extractTerms on a delegate from a super constructor, because the delegate hasn't been set yet).

        Show
        Alan Woodward added a comment - Here is a patch. SpanQuery.createWeight() is now final, and delegates on to an abstract method #createSpanWeight(). This takes an additional boolean parameter to determine whether or not the weight is at the top of a span tree, in which case it should build term stats. Ideally I'd have liked to move SpanQuery.extractTerms() to SpanWeight as well, but that causes complications when collecting terms in the SpanWeight constructor (basically you can't call extractTerms on a delegate from a super constructor, because the delegate hasn't been set yet).
        Hide
        Alan Woodward added a comment -

        Updated patch, building on top of the SpanCollector API in LUCENE-6371.

        This moves both getSpans() and extractTerms() to SpanWeight, and refactors the similarity and scoring a bit. SpanWeight.getSpans() now looks very like Weight.scorer(), which is nice if we want to move towards merging the two APIs.

        Still needs javadocs, etc, but I wanted to get some feedback before doing precommit checks.

        Show
        Alan Woodward added a comment - Updated patch, building on top of the SpanCollector API in LUCENE-6371 . This moves both getSpans() and extractTerms() to SpanWeight, and refactors the similarity and scoring a bit. SpanWeight.getSpans() now looks very like Weight.scorer(), which is nice if we want to move towards merging the two APIs. Still needs javadocs, etc, but I wanted to get some feedback before doing precommit checks.
        Hide
        Alan Woodward added a comment -

        Another patch. This one is nicer, I think. It uses the needsScores parameter to determine whether or not to build the SpanSimilarity for a search.

        Show
        Alan Woodward added a comment - Another patch. This one is nicer, I think. It uses the needsScores parameter to determine whether or not to build the SpanSimilarity for a search.
        Hide
        Paul Elschot added a comment - - edited

        The (2nd) patch looks good in the sense of making SpanWeight.getSpans() consistent with Weight.scorer().
        This also introduces SpanSimilarity, which makes sense.
        On the extra complexity of the SpanWeights that are added for each SpanQuery: at the moment these SpanWeights don't do much, but they are good to have because they can be overridden separately.

        I think these SpanWeights can also allow adding a scoring method to Spans, but perhaps that is better done at another issue.

        Show
        Paul Elschot added a comment - - edited The (2nd) patch looks good in the sense of making SpanWeight.getSpans() consistent with Weight.scorer(). This also introduces SpanSimilarity, which makes sense. On the extra complexity of the SpanWeights that are added for each SpanQuery: at the moment these SpanWeights don't do much, but they are good to have because they can be overridden separately. I think these SpanWeights can also allow adding a scoring method to Spans, but perhaps that is better done at another issue.
        Hide
        Alan Woodward added a comment -

        Final patch. I'll commit this tomorrow, absent any objections.

        Show
        Alan Woodward added a comment - Final patch. I'll commit this tomorrow, absent any objections.
        Hide
        Alan Woodward added a comment -

        Updated to take into account changes from LUCENE-6490. Running precommit now

        Show
        Alan Woodward added a comment - Updated to take into account changes from LUCENE-6490 . Running precommit now
        Hide
        ASF subversion and git services added a comment -

        Commit 1680565 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1680565 ]

        LUCENE-6466: Move SpanQuery.getSpans() and .extractTerms() to SpanWeight

        Show
        ASF subversion and git services added a comment - Commit 1680565 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1680565 ] LUCENE-6466 : Move SpanQuery.getSpans() and .extractTerms() to SpanWeight
        Hide
        ASF subversion and git services added a comment -

        Commit 1680569 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1680569 ]

        LUCENE-6466: Move SpanQuery.getSpans() and .extractTerms() to SpanWeight

        Show
        ASF subversion and git services added a comment - Commit 1680569 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680569 ] LUCENE-6466 : Move SpanQuery.getSpans() and .extractTerms() to SpanWeight
        Hide
        Alan Woodward added a comment -

        Thanks for the review, Paul!

        Show
        Alan Woodward added a comment - Thanks for the review, Paul!
        Hide
        ASF subversion and git services added a comment -

        Commit 1680603 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1680603 ]

        LUCENE-6466: Fix MultiSpansWrapper

        Show
        ASF subversion and git services added a comment - Commit 1680603 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1680603 ] LUCENE-6466 : Fix MultiSpansWrapper
        Hide
        ASF subversion and git services added a comment -

        Commit 1680604 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1680604 ]

        LUCENE-6466: Fix MultiSpansWrapper

        Show
        ASF subversion and git services added a comment - Commit 1680604 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680604 ] LUCENE-6466 : Fix MultiSpansWrapper
        Hide
        ASF subversion and git services added a comment -

        Commit 1680606 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1680606 ]

        LUCENE-6466: Don't commit @Seed...

        Show
        ASF subversion and git services added a comment - Commit 1680606 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1680606 ] LUCENE-6466 : Don't commit @Seed...
        Hide
        ASF subversion and git services added a comment -

        Commit 1680607 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1680607 ]

        LUCENE-6466: Don't commit @Seed...

        Show
        ASF subversion and git services added a comment - Commit 1680607 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680607 ] LUCENE-6466 : Don't commit @Seed...
        Hide
        Robert Muir added a comment -

        Do we really need to expose SpanSimilarity as a separate class here? I find it confusing (e.g. no-op method that returns a null scorer), and its a lot of plumbing. I think instead of this class, there could just be a final method on SpanWeight base-class, to retrieve the appropriate Similarity, and impls of normalization and so on like any other Weights.

        Show
        Robert Muir added a comment - Do we really need to expose SpanSimilarity as a separate class here? I find it confusing (e.g. no-op method that returns a null scorer), and its a lot of plumbing. I think instead of this class, there could just be a final method on SpanWeight base-class, to retrieve the appropriate Similarity, and impls of normalization and so on like any other Weights.
        Hide
        Robert Muir added a comment -

        I tried looking into this patch to see why it causes test failures with scoring changes (https://issues.apache.org/jira/browse/LUCENE-6495), but its hard to see with the indentation changes from various code being shuffled around.

        Show
        Robert Muir added a comment - I tried looking into this patch to see why it causes test failures with scoring changes ( https://issues.apache.org/jira/browse/LUCENE-6495 ), but its hard to see with the indentation changes from various code being shuffled around.
        Hide
        Robert Muir added a comment -

        Also its wierd it only reproduces on java 7. Maybe some hash ordering issue?

        Show
        Robert Muir added a comment - Also its wierd it only reproduces on java 7. Maybe some hash ordering issue?
        Hide
        ASF subversion and git services added a comment -

        Commit 1680978 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1680978 ]

        LUCENE-6371, LUCENE-6466: back out from 5.2, see https://issues.apache.org/jira/browse/LUCENE-6494

        Show
        ASF subversion and git services added a comment - Commit 1680978 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680978 ] LUCENE-6371 , LUCENE-6466 : back out from 5.2, see https://issues.apache.org/jira/browse/LUCENE-6494
        Hide
        Alan Woodward added a comment -

        Patch part 2, following discussion on LUCENE-6371.

        • removes SpanSimilarity, in favour of a map of terms to termcontexts
        • SpanTermQuery can take an optional TermContext in its constructor, similar to TermQuery
        • SpanMTQWrapper now preserves term states when rewriting to SpanTermQueries

        What would be nice would be to try and write an asserting TermsEnum that could check how many times seekExact(BytesRef) was called, to ensure that the various queries are re-using their term states properly.

        Show
        Alan Woodward added a comment - Patch part 2, following discussion on LUCENE-6371 . removes SpanSimilarity, in favour of a map of terms to termcontexts SpanTermQuery can take an optional TermContext in its constructor, similar to TermQuery SpanMTQWrapper now preserves term states when rewriting to SpanTermQueries What would be nice would be to try and write an asserting TermsEnum that could check how many times seekExact(BytesRef) was called, to ensure that the various queries are re-using their term states properly.
        Hide
        Robert Muir added a comment -

        Looks simpler. I am still hoping we can remove the map, but lets do this for now.

        Can we make SpanWeight.buildSimWeight private final? its only used by its ctor.

        Can both rewrite methods currently in SpanMultiTermQuery be fixed to avoid re-seeking? Maybe TopTermsSpanBooleanQueryRewrite was forgotten.

        What would be nice would be to try and write an asserting TermsEnum that could check how many times seekExact(BytesRef) was called, to ensure that the various queries are re-using their term states properly.

        TermQuery/Weight has checks around this that can help. Look for stuff like assertTermNotInReader check. This was missing from spans because of its previous leniency.

        Show
        Robert Muir added a comment - Looks simpler. I am still hoping we can remove the map, but lets do this for now. Can we make SpanWeight.buildSimWeight private final? its only used by its ctor. Can both rewrite methods currently in SpanMultiTermQuery be fixed to avoid re-seeking? Maybe TopTermsSpanBooleanQueryRewrite was forgotten. What would be nice would be to try and write an asserting TermsEnum that could check how many times seekExact(BytesRef) was called, to ensure that the various queries are re-using their term states properly. TermQuery/Weight has checks around this that can help. Look for stuff like assertTermNotInReader check. This was missing from spans because of its previous leniency.
        Hide
        Alan Woodward added a comment -

        Oops, yes, I missed TopTermsSpanBooleanQueryRewrite. Final patch with the changes there, plus some assertions copied from TermWeight/TermScorer.

        Show
        Alan Woodward added a comment - Oops, yes, I missed TopTermsSpanBooleanQueryRewrite. Final patch with the changes there, plus some assertions copied from TermWeight/TermScorer.
        Hide
        Robert Muir added a comment -

        Can we replace methods like this one in SpanTermQuery with calls to Collections.singletonMap?

          protected static Map<Term, TermContext> toMap(Term term, TermContext termContext) {
            Map<Term, TermContext> map = new HashMap<>();
            map.put(term, termContext);
            return map;
          }
        
        Show
        Robert Muir added a comment - Can we replace methods like this one in SpanTermQuery with calls to Collections.singletonMap? protected static Map<Term, TermContext> toMap(Term term, TermContext termContext) { Map<Term, TermContext> map = new HashMap<>(); map.put(term, termContext); return map; }
        Hide
        Robert Muir added a comment -

        I only have one more stylish nit, otherwise I like it. Great to remove the confusing class and also get single-pass spanmultitermquery!

        Can we break this very long run-on in SpanTermQuery.buildSimWeight?

        return searcher.getSimilarity().computeWeight(query.getBoost(), searcher.collectionStatistics(query.getField()), stats);
        

        Instead I would rename 'stats' to 'termStats' and do maybe something like:

        CollectionStatistics collectionStats = searcher.collectionStatistics(...);
        return xxx.computeWeight(query.getBoost(), collectionStats, termStats);
        
        Show
        Robert Muir added a comment - I only have one more stylish nit, otherwise I like it. Great to remove the confusing class and also get single-pass spanmultitermquery! Can we break this very long run-on in SpanTermQuery.buildSimWeight? return searcher.getSimilarity().computeWeight(query.getBoost(), searcher.collectionStatistics(query.getField()), stats); Instead I would rename 'stats' to 'termStats' and do maybe something like: CollectionStatistics collectionStats = searcher.collectionStatistics(...); return xxx.computeWeight(query.getBoost(), collectionStats, termStats);
        Hide
        Alan Woodward added a comment -

        Nits appropriately picked

        Show
        Alan Woodward added a comment - Nits appropriately picked
        Hide
        ASF subversion and git services added a comment -

        Commit 1682513 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1682513 ]

        LUCENE-6466: Remove SpanSimilarity class and make SpanMTQWrapper single-pass

        Show
        ASF subversion and git services added a comment - Commit 1682513 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1682513 ] LUCENE-6466 : Remove SpanSimilarity class and make SpanMTQWrapper single-pass
        Hide
        Alan Woodward added a comment -

        I'll clean up LUCENE-6371 next, before we put this all back into 5.x

        Show
        Alan Woodward added a comment - I'll clean up LUCENE-6371 next, before we put this all back into 5.x
        Hide
        Alan Woodward added a comment -

        Patch for branch 5x (before LUCENE-6371 is added)

        Show
        Alan Woodward added a comment - Patch for branch 5x (before LUCENE-6371 is added)
        Hide
        ASF subversion and git services added a comment -

        Commit 1685538 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1685538 ]

        LUCENE-6466: Move SpanQuery.getSpans() and .extractWeight() to SpanWeight

        Show
        ASF subversion and git services added a comment - Commit 1685538 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1685538 ] LUCENE-6466 : Move SpanQuery.getSpans() and .extractWeight() to SpanWeight
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development