|
The ability to transform doc scores obtained by a query is now part of
I think that to a certain extent, the patch in this issue went farther than that of 1. score of a single sub-query (any query). The latter is optional. The latter is the one that assigns a score equals to the value of an indexed field. For this reason I hesitated to mark this issue as a duplicate of But I did not want to basically re-implement BooleanQuery for a multi-queries score transformation. Thoughts? Mike,
If I understood it correctly your patch can be described as:
The regular Bolean Or could probably be phrased this way as Sum(OR)Qurey. Now in
So it currently doesn't support your comment When first writing CustomScoreQuery I looked at combining any two or N subqueries, but wasn't sure how to do this. How to normalize. How to calculate the weights. But now I think that we could perhaps follow your approach closer: call it CustomOrQuery, go for any N subqueries, and define f() accordingly. But is this really required / useful? Thanks, I just asked for a product scored BooleanQuery on java-users and Mike pointed me in the direction of this bug. My use case is to get the non-phrase query "John Bush" to rank "John Bush" higher than "George Bush" or "John Kerry". I believe this is a common use case (I have 3 or 4 bugs filed against search quality internally that boil down to this issue.)
Hi Doron,
The main use case is the same as for documents (and to a lesser extent, field-) boosts: the ability to weight a document by a certain amount (rather than adding an additive boost, as adding an additional subclause to the query would entail). The function query capability works for many situations, as you can store the various types of boosts in a FieldCache and use your approach. But this doesn't scale when there are tons of possible boost fields (which would usually be sparsely-populated). SparseFieldCache, anyone? I decided to move away from ProductQueries for the time being, so that is no longer the main use case of this patch. Primarily the patch stems from developer frustration of implementing something like ProductQuery. ISTM that the subquery-handling logic (present in BooleanQuery and slightly different in DisMaxQuery) needn't be so tightly coupled with a choice of scoring function. For the record, DisMax is actually a ( x*Max + (1-x)*Sum ) Query, so it is both Sum and Max. Perhaps if we add Prod to the options, there are no more useful subquery combinators? Tim: That is typically done by adding an optional implicit phrase query:
john bush -> +(john bush) "john bush"~1000 This works very well for two term queries, but less well when there is more than that. See also DisjunctionMaxQuery if there are multiple fields > The function query capability works for many situations, as you
> can store the various types of boosts in a FieldCache and use > your approach. But this doesn't scale when there are tons of > possible boost fields (which would usually be sparsely-populated). > SparseFieldCache, anyone? For large collections loading would indeed take long.
Here's an approach I think will work.
Rename CustomScoreQuery to CustomBoostQuery, and remove the ValueSource-specific logic. Really there is no reason to limit the logic to ValueSource queries: the only important criterion is that we don't expect the docs matches against the boosting query only to be returned (the doc set is unchanged relative to the original query). I'm not sure what will happen if the boost query doesn't match the document being boosted, however. Perhaps there should be a default value? Does this still belong in the function package? Do address the issue above, the following needs to be added:
=================================================================== — build-src/java/solr/org/apache/lucene/search/CustomBoostQuery.java (revision 9312) +++ build-src/java/solr/org/apache/lucene/search/CustomBoostQuery.java (working copy) @@ -280,7 +280,7 @@ /*(non-Javadoc) @see org.apache.lucene.search.Scorer#score() */ @@ -300,7 +300,8 @@
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This patch is demonstrative only. There are no tests, and I'm pretty sure the query norm calculation isn't correct in general.