|
[
Permlink
| « Hide
]
Hoss Man added a comment - 05/Oct/05 01:27 PM
Just a thought, but in the same spirit as SpanQuery, these classes may make sense in their own sub package ... ie: org.apache.lucene.search.fq
Perhaps not a bad idea considering that the number of classes may top 12 after adding a few more function types.
Anyone else have package name suggestions/preferences? search.fq? Added ReciprocalFloatFunction, a/(mx+b), a natural choice for date boosting,
and ReverseOrdFieldSource, which numbers terms in reverse order as OrdFieldSource This newest version simplifies a lot of cruft from the previous version.
A FunctionQuery takes a ValueSource. So, you can do things (symbolically), like int(fieldx) A useful one for boosting more recent dates might be: I'm not sure if this is the final form yet... perhaps the division between ValueSource and Query could be erased such that every value source is a query already (so that you don't need to pass it to a FunctionQuery). It would also be nice to freely mix a lucene Query and a ValueSource so that you could do something like: Of course, I haven't done the "product" function yet... right now, the normal way tocombine with other queries to influence the score is to put it in a boolean query: changed getSimpleName() to getName() to preserve Java1.4 compatability.
Yes, I've independently come up with something similar. What's interesting is that you can also perform filtering (like date filtering) by simply returning negative Float.MAX_VALUE. This pretty much guarantees that the document's final score is < 0.
I've also come across the need to be able to modify the final score of a document, and have done this via a score-modifying query wrapper which delegates the scoring to the functionquery it wraps, then applying an additional function to it. Is that similar to the product function you mention? This version is now slightly out of date.
For now, consider the definitive version to be in Solr: http://incubator.apache.org/solr http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/search/function/ Solr currently has a QueryParser hack to parse a FunctionQuery... you use val as the fieldName to create a FunctionQuery Is there any motivation out there to push this down from Solr to Lucene? I see from time to time on java-user that it comes in handy for people using Lucene. What do the Solr people think about moving it into Lucene core?
+1 to FunctionQuery being brought into Lucene proper.
Grant: Yeah, I think so. 7 votes and 5 watchers so far tells me people want this in Lucene.
I'm in favor ... i think once upon a time Yonik held off because he wasn't sure if he liked the API, but since it's been in Apache Solr for over a year now, i think it's safe.
I don't suppose you'd be interested in opening a sister Solr issue and submitting a patch to deprecate those instances and make them subclass the ones you'll be migrating to Lucene would you? I just remembered one of the reasons why i didn't do this the last time i looked at it: i don't think FunctionQuery has any good unit tests in the Solr code base – there might be some tests that use the SOlrTestHarness to trigger function queries, but they aren't really portable.
> i think once upon a time Yonik held off because he wasn't sure if he liked the API
Right... it's just never been at the top of my list to revisit. The main thing I was wondering is if I should have a whole ValueSource thing... perhaps FunctionQuery should be able to use other Queries directly. For example, one could have Right now, increasing the score of a document based on a field value is done in an additive way by adding a FunctionQuery clause to a BooleanQuery. One could create a ValueSource that wraps another query to get a multiplicative effect, but is that the simplest approach? I've often wanted to multiply the scores of two queries. I looked at FunctionQuery but didn't really see an easy way of getting around the ValueSource thing.
See LUCENE-850 for my eventual solution I intend to take a shot at this, with the approach of two parts/steps -
1) simple scoring based on values of stored field. 2) composing a document score as (some / math / extensible) function of one or more scores of sub queries. Thinking of a new package: o.a.l.search.function. This would seem to bring together (Background/motivation: I was considering using payloads for trying some static scoring alternatives (e.g. link info based), but I realized that function queries are much more suitable for this, and would be a handy addition to Lucene core.) Attached function.patch.txt adds three new queries:
1. ValueSourceQuery - an Expert type of query, more or less same as 2. FieldScoreQuery - subclass of ValueSourceQuery. It is easier 3. CustomScoreQuery - this query allows to custom the score of its contained The patch Included tests and javadocs. I will later put the javadocs somewhere, to allow commenting on the API without The tests found quite a few bugs for me, and I hope I got the scorers and weight Modifying the issue name to reflect its current content.
javadocs for the new org.apache.lucene.search.function package
can now be reviewed at http://people.apache.org/~doronc/api Updated patch to current trunk.
Also:
Updated patch javadocs in http://people.apache.org/~doronc/api/ Yonik (and other Solr's search.function people),
I omitted some of the original functions/sources that were in your code:
The first 3 should be straightforward to implemented by extending CustomScoreQuery, like the code samples show. Do you think such implementations should be included, ready to use? The last 2 Ord ones can be implemented as before, i.e. with the "expert" class ValueSource that was kept. But they seemed spooky to me, with that comment regarding multi-searchers. Are these just examples, or are they really useful? Do you think they should be included? Thanks, Doron: I haven't really been able to keep up with the way this issue has evolved, or dig into your new patches, but to answer your question about the Ord functions: yes they are very useful, and it active use in Solr. I believe the warning about MultiSearcher mainly has to do with the fact that the MultiSearcher/FieldCache APIs give us know way to know the "lowest" of "highest" value in a field cache across an entire logical index, so the Ord functions can't really be queried against a MultiSearcher.
ok, so I will add in the two ord classes in, so that Solr can move to use this package.
Updated patch:
Javadocs were updated at http://people.apache.org/~doronc/api I will commit this later today of there are no objections. committed (experimental mode).
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||