Issue Details (XML | Word | Printable)

Key: LUCENE-446
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Doron Cohen
Reporter: Yonik Seeley
Votes: 9
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

search.function - (1) score based on field value, (2) simple score customizability

Created: 05/Oct/05 05:23 AM   Updated: 19/Jun/07 08:14 AM
Return to search
Component/s: Search
Affects Version/s: None
Fix Version/s: 2.2

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works function.patch.txt 2007-06-05 02:33 AM Doron Cohen 117 kB
Text File Licensed for inclusion in ASF works function.patch.txt 2007-05-05 07:04 AM Doron Cohen 93 kB
Text File Licensed for inclusion in ASF works function.patch.txt 2007-05-03 03:09 AM Doron Cohen 93 kB
Zip Archive Licensed for inclusion in ASF works function.zip 2005-12-01 04:56 AM Yonik Seeley 9 kB
Zip Archive Licensed for inclusion in ASF works function.zip 2005-12-01 02:56 AM Yonik Seeley 9 kB
Issue Links:
Reference

Lucene Fields: Patch Available
Resolution Date: 05/Jun/07 04:39 PM


 Description  « Hide
FunctionQuery can return a score based on a field's value or on it's ordinal value.

FunctionFactory subclasses define the details of the function. There is currently a LinearFloatFunction (a line specified by slope and intercept).

Field values are typically obtained from FieldValueSourceFactory. Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Hoss Man added a comment - 05/Oct/05 01:27 PM
Just a thought, but in the same spirit as SpanQuery, these classes may make sense in their own sub package ... ie: org.apache.lucene.search.fq

Yonik Seeley added a comment - 06/Oct/05 08:49 AM
Perhaps not a bad idea considering that the number of classes may top 12 after adding a few more function types.

Anyone else have package name suggestions/preferences?

search.fq?
search.func?
search.function?


Yonik Seeley added a comment - 07/Oct/05 10:29 PM
Added ReciprocalFloatFunction, a/(mx+b), a natural choice for date boosting,
and ReverseOrdFieldSource, which numbers terms in reverse order as OrdFieldSource

Yonik Seeley added a comment - 01/Dec/05 02:56 AM
attaching newest version

Yonik Seeley added a comment - 01/Dec/05 03:14 AM
This newest version simplifies a lot of cruft from the previous version.

A FunctionQuery takes a ValueSource.
The ValueSource produces a DocValues object for a specific IndexReader (It's like a lucene scorer).
The ValueSource is also used as input to functions, which are ValueSources themselves.

So, you can do things (symbolically), like

int(fieldx)
float(fieldx)
ord(fieldx)
rord(fieldx)
linear(fieldx,1,2)
linear(rord(fieldx),1,2,3)
reciprocal(linear(fieldx,1,2),3,4,5)

A useful one for boosting more recent dates might be:
reciprocal(rord(mydatefield),1,1000,1000)

I'm not sure if this is the final form yet... perhaps the division between ValueSource and Query could be erased such that every value source is a query already (so that you don't need to pass it to a FunctionQuery).

It would also be nice to freely mix a lucene Query and a ValueSource so that you could do something like:
product(luceneQuery, val(fieldx))
or even
product(luceneQuery1, luceneQuery2)

Of course, I haven't done the "product" function yet... right now, the normal way tocombine with other queries to influence the score is to put it in a boolean query:
+other_lucene_query_clauses +function_query^.1
the score from the function query is added to the other query.


Yonik Seeley added a comment - 01/Dec/05 04:56 AM
changed getSimpleName() to getName() to preserve Java1.4 compatability.

Kelvin Tan added a comment - 14/Feb/06 02:48 PM
Yes, I've independently come up with something similar. What's interesting is that you can also perform filtering (like date filtering) by simply returning negative Float.MAX_VALUE. This pretty much guarantees that the document's final score is < 0.

I've also come across the need to be able to modify the final score of a document, and have done this via a score-modifying query wrapper which delegates the scoring to the functionquery it wraps, then applying an additional function to it. Is that similar to the product function you mention?


Yonik Seeley added a comment - 03/Mar/06 03:54 AM
This version is now slightly out of date.
For now, consider the definitive version to be in Solr:
http://incubator.apache.org/solr
http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/search/function/

Solr currently has a QueryParser hack to parse a FunctionQuery... you use val as the fieldName to create a FunctionQuery
Examples:
val:myfield
val:"max(myfield,2.0)"
val:"max(linear(myfield,1.0,.1), 5.0)"


Grant Ingersoll added a comment - 18/Mar/07 01:39 PM
Is there any motivation out there to push this down from Solr to Lucene? I see from time to time on java-user that it comes in handy for people using Lucene. What do the Solr people think about moving it into Lucene core?

Erik Hatcher added a comment - 18/Mar/07 02:38 PM
+1 to FunctionQuery being brought into Lucene proper.

Otis Gospodnetic added a comment - 19/Mar/07 01:42 AM
Grant: Yeah, I think so. 7 votes and 5 watchers so far tells me people want this in Lucene.

Hoss Man added a comment - 20/Mar/07 12:23 AM
I'm in favor ... i think once upon a time Yonik held off because he wasn't sure if he liked the API, but since it's been in Apache Solr for over a year now, i think it's safe.

I don't suppose you'd be interested in opening a sister Solr issue and submitting a patch to deprecate those instances and make them subclass the ones you'll be migrating to Lucene would you?


Hoss Man added a comment - 20/Mar/07 02:19 AM
I just remembered one of the reasons why i didn't do this the last time i looked at it: i don't think FunctionQuery has any good unit tests in the Solr code base – there might be some tests that use the SOlrTestHarness to trigger function queries, but they aren't really portable.

Yonik Seeley added a comment - 24/Mar/07 08:38 PM
> i think once upon a time Yonik held off because he wasn't sure if he liked the API

Right... it's just never been at the top of my list to revisit.

The main thing I was wondering is if I should have a whole ValueSource thing... perhaps FunctionQuery should be able to use other Queries directly. For example, one could have
MultiplyFunctionQuery(MyNormalQuery, MyFieldFunctionQuery) to boost a query by another query (in this case a function query).

Right now, increasing the score of a document based on a field value is done in an additive way by adding a FunctionQuery clause to a BooleanQuery. One could create a ValueSource that wraps another query to get a multiplicative effect, but is that the simplest approach?


Mike Klaas added a comment - 26/Mar/07 07:13 PM
I've often wanted to multiply the scores of two queries. I looked at FunctionQuery but didn't really see an easy way of getting around the ValueSource thing.

See LUCENE-850 for my eventual solution


Doron Cohen added a comment - 27/Apr/07 07:32 PM
I intend to take a shot at this, with the approach of two parts/steps -
1) simple scoring based on values of stored field.
2) composing a document score as (some / math / extensible) function of one or more scores of sub queries.

Thinking of a new package: o.a.l.search.function.

This would seem to bring together LUCENE-446 and LUCENE-850 and I think would be handy for trying various scoring techniques.

(Background/motivation: I was considering using payloads for trying some static scoring alternatives (e.g. link info based), but I realized that function queries are much more suitable for this, and would be a handy addition to Lucene core.)


Doron Cohen added a comment - 03/May/07 03:09 AM
Attached function.patch.txt adds three new queries:

1. ValueSourceQuery - an Expert type of query, more or less same as
in original patch. It is very flexible - takes a ValueSource as input - so it
could be extended to do additional things (ie not only indexed fields).

2. FieldScoreQuery - subclass of ValueSourceQuery. It is easier
to use, and operates on cached indexed field. A doc score is set
by the value of that field. There are 4 field parser types for this: float, int,
short, and byte. They require different size in RAM when cached: 8, 4, 2,
and 1 bytes respectively per document. The cache was modified to
accommodate this. (Seems worth to save RAM where possible.)

3. CustomScoreQuery - this query allows to custom the score of its contained
sub-query by implementing a customScore() function. Any computation is
possible, as long as it is based on the original score of the sub-query,
the (optional) score of an (optional) sub-valueSourceQuery, and the docid.
This query also covers (somewhat differently) LUCENE-850

The patch Included tests and javadocs.
All tests pass.

I will later put the javadocs somewhere, to allow commenting on the API without
applying the patch.

The tests found quite a few bugs for me, and I hope I got the scorers and weight
correct now - I would very much appreciate review comments on these delicate
parts.,,


Doron Cohen added a comment - 03/May/07 03:19 AM
Modifying the issue name to reflect its current content.

Doron Cohen added a comment - 03/May/07 06:59 AM
javadocs for the new org.apache.lucene.search.function package
can now be reviewed at http://people.apache.org/~doronc/api

Doron Cohen added a comment - 05/May/07 07:04 AM
Updated patch to current trunk.

Also:

  • moved TYPE consts in FieldScoreQuery to FieldScoreQuery.Type (e.g. FieldScoreQuery.Type.BYTE).
  • some documentation fixes.

Updated patch javadocs in http://people.apache.org/~doronc/api/


Doron Cohen added a comment - 05/May/07 07:20 AM
Yonik (and other Solr's search.function people),

I omitted some of the original functions/sources that were in your code:

  • LinearFloatFunction, MaxFloatFunction, ReciprocalFloatFunction,
  • OrdFieldSource, ReverseOrdFieldSource

The first 3 should be straightforward to implemented by extending CustomScoreQuery, like the code samples show. Do you think such implementations should be included, ready to use?

The last 2 Ord ones can be implemented as before, i.e. with the "expert" class ValueSource that was kept. But they seemed spooky to me, with that comment regarding multi-searchers. Are these just examples, or are they really useful? Do you think they should be included?

Thanks,
Doron


Hoss Man added a comment - 04/Jun/07 02:36 AM
Doron: I haven't really been able to keep up with the way this issue has evolved, or dig into your new patches, but to answer your question about the Ord functions: yes they are very useful, and it active use in Solr. I believe the warning about MultiSearcher mainly has to do with the fact that the MultiSearcher/FieldCache APIs give us know way to know the "lowest" of "highest" value in a field cache across an entire logical index, so the Ord functions can't really be queried against a MultiSearcher.

Doron Cohen added a comment - 04/Jun/07 10:24 PM
ok, so I will add in the two ord classes in, so that Solr can move to use this package.

Doron Cohen added a comment - 05/Jun/07 02:33 AM
Updated patch:
  • fixes explanation and toString() issues.
  • adds the Ord and ReverseOrd valueSource classes that are in use in Solr
  • warn in the javadocs from the experimental state of this package

Javadocs were updated at http://people.apache.org/~doronc/api

I will commit this later today of there are no objections.


Doron Cohen added a comment - 05/Jun/07 04:39 PM
committed (experimental mode).