Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7623

Add FunctionScoreQuery and FunctionMatchQuery

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should update the various function scoring queries to use the new DoubleValues API

      1. LUCENE-7623.patch
        25 kB
        Alan Woodward
      2. LUCENE-7623.patch
        25 kB
        Alan Woodward
      3. LUCENE-7623.patch
        28 kB
        Alan Woodward
      4. LUCENE-7623.patch
        28 kB
        Alan Woodward
      5. LUCENE-7623.patch
        27 kB
        Alan Woodward
      6. LUCENE-7623.patch
        29 kB
        Alan Woodward

        Activity

        Hide
        romseygeek Alan Woodward added a comment -

        Here is a patch with two new queries, FunctionScoreQuery and FunctionMatchQuery.

        • FunctionScoreQuery takes a query to wrap and a DoubleValuesSource, and modifies the internal query's score using the per-document values. This can act as a replacement for FunctionQuery using a constant or per-field DoubleValuesSource, or as a replacement for CustomScoreQuery or BoostedQuery using an expression. The wrapped query's scores are passed to any DoubleValues created via getValues(LeafReaderContext, DoubleValues).
        • FunctionMatchQuery takes a DoubleValuesSource and a Predicate<Double>, and returns any documents with a value that matches the predicate. It works by linear scan, but uses two-phase iteration so it should be efficient when combined with a more restrictive query. Unlike FunctionRangeQuery, it returns a constant score, so if you want to use the DoubleValues as scores, you will need to wrap it with a FunctionScoreQuery.

        The patch also adds some more informative toString() implementations to expressions and field DoubleValues, to make the Explanations for FunctionScoreQuery more useful.

        Show
        romseygeek Alan Woodward added a comment - Here is a patch with two new queries, FunctionScoreQuery and FunctionMatchQuery. FunctionScoreQuery takes a query to wrap and a DoubleValuesSource, and modifies the internal query's score using the per-document values. This can act as a replacement for FunctionQuery using a constant or per-field DoubleValuesSource, or as a replacement for CustomScoreQuery or BoostedQuery using an expression. The wrapped query's scores are passed to any DoubleValues created via getValues(LeafReaderContext, DoubleValues). FunctionMatchQuery takes a DoubleValuesSource and a Predicate<Double>, and returns any documents with a value that matches the predicate. It works by linear scan, but uses two-phase iteration so it should be efficient when combined with a more restrictive query. Unlike FunctionRangeQuery, it returns a constant score, so if you want to use the DoubleValues as scores, you will need to wrap it with a FunctionScoreQuery. The patch also adds some more informative toString() implementations to expressions and field DoubleValues, to make the Explanations for FunctionScoreQuery more useful.
        Hide
        romseygeek Alan Woodward added a comment -

        Updated patch, FunctionMatchQuery now uses ConstantScoreWeight.

        Show
        romseygeek Alan Woodward added a comment - Updated patch, FunctionMatchQuery now uses ConstantScoreWeight.
        Hide
        dsmiley David Smiley added a comment -

        Looks alright but I didn't review thoroughly. I noticed one problem: TwoPhaseIterator.matchCost as implemented here isn't right. It's supposed to be the match cost for a single document, thus returning maxDocs is definitely not the right response. See the javadocs. Unfortunately since DoubleValueSource has no similar cost, you can't propagate... so might as well return some constant. Judging from existing impls... anywhere between 10 and 100 is good to me.

        Show
        dsmiley David Smiley added a comment - Looks alright but I didn't review thoroughly. I noticed one problem: TwoPhaseIterator.matchCost as implemented here isn't right. It's supposed to be the match cost for a single document , thus returning maxDocs is definitely not the right response. See the javadocs. Unfortunately since DoubleValueSource has no similar cost, you can't propagate... so might as well return some constant. Judging from existing impls... anywhere between 10 and 100 is good to me.
        Hide
        romseygeek Alan Woodward added a comment -

        Good catch, thanks David Smiley. Here's an updated patch, including an updated build.xml so that ant runs the new tests properly.

        Show
        romseygeek Alan Woodward added a comment - Good catch, thanks David Smiley . Here's an updated patch, including an updated build.xml so that ant runs the new tests properly.
        Hide
        jpountz Adrien Grand added a comment -

        I like this functionality, here are some comments:

        • Why did you add a new forSearcher method, there does not seem to be any useful implementation of it?
        • I think it can be problematic to call doubleValue() from toString() since doubleValue() requires that the values source be positioned while toString() may be called anytime.
        • Why do the new tests depend on the expressions module? I would rather like to have custom values sources in the test class than adding this new dependency.
        • Let's use DoublePredicate rather than Predicate<Double>?
        • Let's maybe add a TODO around the cost impl of FunctionMatchQuery about whether values sources should be able to give this information.
        • Maybe FunctionScoreWeight should cast to double after the multiplication with the boost in order to reduce the accuracy loss?
        • Can we not use the @Test annotation, like in lucene/core?
        Show
        jpountz Adrien Grand added a comment - I like this functionality, here are some comments: Why did you add a new forSearcher method, there does not seem to be any useful implementation of it? I think it can be problematic to call doubleValue() from toString() since doubleValue() requires that the values source be positioned while toString() may be called anytime. Why do the new tests depend on the expressions module? I would rather like to have custom values sources in the test class than adding this new dependency. Let's use DoublePredicate rather than Predicate<Double> ? Let's maybe add a TODO around the cost impl of FunctionMatchQuery about whether values sources should be able to give this information. Maybe FunctionScoreWeight should cast to double after the multiplication with the boost in order to reduce the accuracy loss? Can we not use the @Test annotation, like in lucene/core?
        Hide
        romseygeek Alan Woodward added a comment - - edited

        Thanks Adrien. Here's an updated patch:

        • Removes forSearcher() - I'd put this in initially so that we could create sources for particular IndexSearchers (by analogy to Query/Weight), but it isn't necessary for this patch.
        • No more toString() impls on DoubleValues, explanations are now taken from their parent sources
        • Tests don't depend on expressions, instead I've added a couple of helper functions to DoubleValuesSource that will apply DoubleFunctions to a wrapped source.
        • Predicate<Double> -> DoublePredicate
        • TODO added
        • Changed the casting
        • annotations removed
        Show
        romseygeek Alan Woodward added a comment - - edited Thanks Adrien. Here's an updated patch: Removes forSearcher() - I'd put this in initially so that we could create sources for particular IndexSearchers (by analogy to Query/Weight), but it isn't necessary for this patch. No more toString() impls on DoubleValues, explanations are now taken from their parent sources Tests don't depend on expressions, instead I've added a couple of helper functions to DoubleValuesSource that will apply DoubleFunctions to a wrapped source. Predicate<Double> -> DoublePredicate TODO added Changed the casting annotations removed
        Hide
        jpountz Adrien Grand added a comment -

        Did you upload an old patch? It does not seem to have the changes you mentioned.

        Show
        jpountz Adrien Grand added a comment - Did you upload an old patch? It does not seem to have the changes you mentioned.
        Hide
        romseygeek Alan Woodward added a comment -

        I apparently did, yes, sorry... here's the correct patch

        Show
        romseygeek Alan Woodward added a comment - I apparently did, yes, sorry... here's the correct patch
        Hide
        jpountz Adrien Grand added a comment -

        Thanks Alan, I just have some minor comments now, +1 otherwise!

        • Let's use DoubleUnaryOperator rather than DoubleFunction<Double> and ToDoubleBiFunction<Double,Double> rather tan BiFunction<Double, Double, Double>
        • Let's make FunctionMatchQuery and FunctionScoreQuery final
        • In FunctionScoreQuery, I think we should only ask for scores on the wrapped weight if the values source needs scores, ie. Weight inner = in.createWeight(searcher, needsScores && valuesSource.needsScores(), 1f);
        • I think the tests could be improved by checking TopDocs.totalHits too? Otherwise we might miss cases when the top docs are the same but one query yields more results?
        Show
        jpountz Adrien Grand added a comment - Thanks Alan, I just have some minor comments now, +1 otherwise! Let's use DoubleUnaryOperator rather than DoubleFunction<Double> and ToDoubleBiFunction<Double,Double> rather tan BiFunction<Double, Double, Double> Let's make FunctionMatchQuery and FunctionScoreQuery final In FunctionScoreQuery , I think we should only ask for scores on the wrapped weight if the values source needs scores, ie. Weight inner = in.createWeight(searcher, needsScores && valuesSource.needsScores(), 1f); I think the tests could be improved by checking TopDocs.totalHits too? Otherwise we might miss cases when the top docs are the same but one query yields more results?
        Hide
        romseygeek Alan Woodward added a comment -

        Final patch with Adrien's tweaks, will commit shortly (pending precommit)

        Show
        romseygeek Alan Woodward added a comment - Final patch with Adrien's tweaks, will commit shortly (pending precommit)
        Hide
        jpountz Adrien Grand added a comment -

        +1

        Show
        jpountz Adrien Grand added a comment - +1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 85ae5de7032ca4511d598a68961864bcfc75caa2 in lucene-solr's branch refs/heads/branch_6x from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85ae5de ]

        LUCENE-7623: Add FunctionMatchQuery and FunctionScoreQuery

        Show
        jira-bot ASF subversion and git services added a comment - Commit 85ae5de7032ca4511d598a68961864bcfc75caa2 in lucene-solr's branch refs/heads/branch_6x from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85ae5de ] LUCENE-7623 : Add FunctionMatchQuery and FunctionScoreQuery
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fc2e0fd13324699fe1ddb15bb09960a8501f52f5 in lucene-solr's branch refs/heads/master from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fc2e0fd ]

        LUCENE-7623: Add FunctionMatchQuery and FunctionScoreQuery

        Show
        jira-bot ASF subversion and git services added a comment - Commit fc2e0fd13324699fe1ddb15bb09960a8501f52f5 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fc2e0fd ] LUCENE-7623 : Add FunctionMatchQuery and FunctionScoreQuery
        Hide
        romseygeek Alan Woodward added a comment -

        Thanks for the reviews!

        Show
        romseygeek Alan Woodward added a comment - Thanks for the reviews!
        Hide
        dsmiley David Smiley added a comment -

        I'm a little surprised to see these additions in the Lucene queries module because I thought there was an overarching intent to bring the functionality of ValueSource's into Lucene core.

        Ramification: Now in LUCENE-7737 I think I see a case where spatial-extras PointVectorStrategy is adding a bunch more code that could be avoided if FunctionMatchQuery were in core. Note that the Solr legacy copy of PointVectorStrategy in LUCENE-7737 is able to simply use FunctionMatchQuery.

        Show
        dsmiley David Smiley added a comment - I'm a little surprised to see these additions in the Lucene queries module because I thought there was an overarching intent to bring the functionality of ValueSource's into Lucene core. Ramification: Now in LUCENE-7737 I think I see a case where spatial-extras PointVectorStrategy is adding a bunch more code that could be avoided if FunctionMatchQuery were in core. Note that the Solr legacy copy of PointVectorStrategy in LUCENE-7737 is able to simply use FunctionMatchQuery.

          People

          • Assignee:
            romseygeek Alan Woodward
            Reporter:
            romseygeek Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development