Lucene - Core
  1. Lucene - Core
  2. LUCENE-446

search.function - (1) score based on field value, (2) simple score customizability

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      FunctionQuery can return a score based on a field's value or on it's ordinal value.

      FunctionFactory subclasses define the details of the function. There is currently a LinearFloatFunction (a line specified by slope and intercept).

      Field values are typically obtained from FieldValueSourceFactory. Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource.

      1. function.patch.txt
        117 kB
        Doron Cohen
      2. function.patch.txt
        93 kB
        Doron Cohen
      3. function.patch.txt
        93 kB
        Doron Cohen
      4. function.zip
        9 kB
        Yonik Seeley
      5. function.zip
        9 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Just a thought, but in the same spirit as SpanQuery, these classes may make sense in their own sub package ... ie: org.apache.lucene.search.fq

          Show
          Hoss Man added a comment - Just a thought, but in the same spirit as SpanQuery, these classes may make sense in their own sub package ... ie: org.apache.lucene.search.fq
          Hide
          Yonik Seeley added a comment -

          Perhaps not a bad idea considering that the number of classes may top 12 after adding a few more function types.

          Anyone else have package name suggestions/preferences?

          search.fq?
          search.func?
          search.function?

          Show
          Yonik Seeley added a comment - Perhaps not a bad idea considering that the number of classes may top 12 after adding a few more function types. Anyone else have package name suggestions/preferences? search.fq? search.func? search.function?
          Hide
          Yonik Seeley added a comment -

          Added ReciprocalFloatFunction, a/(mx+b), a natural choice for date boosting,
          and ReverseOrdFieldSource, which numbers terms in reverse order as OrdFieldSource

          Show
          Yonik Seeley added a comment - Added ReciprocalFloatFunction, a/(mx+b), a natural choice for date boosting, and ReverseOrdFieldSource, which numbers terms in reverse order as OrdFieldSource
          Hide
          Yonik Seeley added a comment -

          attaching newest version

          Show
          Yonik Seeley added a comment - attaching newest version
          Hide
          Yonik Seeley added a comment -

          This newest version simplifies a lot of cruft from the previous version.

          A FunctionQuery takes a ValueSource.
          The ValueSource produces a DocValues object for a specific IndexReader (It's like a lucene scorer).
          The ValueSource is also used as input to functions, which are ValueSources themselves.

          So, you can do things (symbolically), like

          int(fieldx)
          float(fieldx)
          ord(fieldx)
          rord(fieldx)
          linear(fieldx,1,2)
          linear(rord(fieldx),1,2,3)
          reciprocal(linear(fieldx,1,2),3,4,5)

          A useful one for boosting more recent dates might be:
          reciprocal(rord(mydatefield),1,1000,1000)

          I'm not sure if this is the final form yet... perhaps the division between ValueSource and Query could be erased such that every value source is a query already (so that you don't need to pass it to a FunctionQuery).

          It would also be nice to freely mix a lucene Query and a ValueSource so that you could do something like:
          product(luceneQuery, val(fieldx))
          or even
          product(luceneQuery1, luceneQuery2)

          Of course, I haven't done the "product" function yet... right now, the normal way tocombine with other queries to influence the score is to put it in a boolean query:
          +other_lucene_query_clauses +function_query^.1
          the score from the function query is added to the other query.

          Show
          Yonik Seeley added a comment - This newest version simplifies a lot of cruft from the previous version. A FunctionQuery takes a ValueSource. The ValueSource produces a DocValues object for a specific IndexReader (It's like a lucene scorer). The ValueSource is also used as input to functions, which are ValueSources themselves. So, you can do things (symbolically), like int(fieldx) float(fieldx) ord(fieldx) rord(fieldx) linear(fieldx,1,2) linear(rord(fieldx),1,2,3) reciprocal(linear(fieldx,1,2),3,4,5) A useful one for boosting more recent dates might be: reciprocal(rord(mydatefield),1,1000,1000) I'm not sure if this is the final form yet... perhaps the division between ValueSource and Query could be erased such that every value source is a query already (so that you don't need to pass it to a FunctionQuery). It would also be nice to freely mix a lucene Query and a ValueSource so that you could do something like: product(luceneQuery, val(fieldx)) or even product(luceneQuery1, luceneQuery2) Of course, I haven't done the "product" function yet... right now, the normal way tocombine with other queries to influence the score is to put it in a boolean query: +other_lucene_query_clauses +function_query^.1 the score from the function query is added to the other query.
          Hide
          Yonik Seeley added a comment -

          changed getSimpleName() to getName() to preserve Java1.4 compatability.

          Show
          Yonik Seeley added a comment - changed getSimpleName() to getName() to preserve Java1.4 compatability.
          Hide
          Kelvin Tan added a comment -

          Yes, I've independently come up with something similar. What's interesting is that you can also perform filtering (like date filtering) by simply returning negative Float.MAX_VALUE. This pretty much guarantees that the document's final score is < 0.

          I've also come across the need to be able to modify the final score of a document, and have done this via a score-modifying query wrapper which delegates the scoring to the functionquery it wraps, then applying an additional function to it. Is that similar to the product function you mention?

          Show
          Kelvin Tan added a comment - Yes, I've independently come up with something similar. What's interesting is that you can also perform filtering (like date filtering) by simply returning negative Float.MAX_VALUE. This pretty much guarantees that the document's final score is < 0. I've also come across the need to be able to modify the final score of a document, and have done this via a score-modifying query wrapper which delegates the scoring to the functionquery it wraps, then applying an additional function to it. Is that similar to the product function you mention?
          Hide
          Yonik Seeley added a comment -

          This version is now slightly out of date.
          For now, consider the definitive version to be in Solr:
          http://incubator.apache.org/solr
          http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/search/function/

          Solr currently has a QueryParser hack to parse a FunctionQuery... you use val as the fieldName to create a FunctionQuery
          Examples:
          val:myfield
          val:"max(myfield,2.0)"
          val:"max(linear(myfield,1.0,.1), 5.0)"

          Show
          Yonik Seeley added a comment - This version is now slightly out of date. For now, consider the definitive version to be in Solr: http://incubator.apache.org/solr http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/search/function/ Solr currently has a QueryParser hack to parse a FunctionQuery... you use val as the fieldName to create a FunctionQuery Examples: val :myfield val :"max(myfield,2.0)" val :"max(linear(myfield,1.0,.1), 5.0)"
          Hide
          Grant Ingersoll added a comment -

          Is there any motivation out there to push this down from Solr to Lucene? I see from time to time on java-user that it comes in handy for people using Lucene. What do the Solr people think about moving it into Lucene core?

          Show
          Grant Ingersoll added a comment - Is there any motivation out there to push this down from Solr to Lucene? I see from time to time on java-user that it comes in handy for people using Lucene. What do the Solr people think about moving it into Lucene core?
          Hide
          Erik Hatcher added a comment -

          +1 to FunctionQuery being brought into Lucene proper.

          Show
          Erik Hatcher added a comment - +1 to FunctionQuery being brought into Lucene proper.
          Hide
          Otis Gospodnetic added a comment -

          Grant: Yeah, I think so. 7 votes and 5 watchers so far tells me people want this in Lucene.

          Show
          Otis Gospodnetic added a comment - Grant: Yeah, I think so. 7 votes and 5 watchers so far tells me people want this in Lucene.
          Hide
          Hoss Man added a comment -

          I'm in favor ... i think once upon a time Yonik held off because he wasn't sure if he liked the API, but since it's been in Apache Solr for over a year now, i think it's safe.

          I don't suppose you'd be interested in opening a sister Solr issue and submitting a patch to deprecate those instances and make them subclass the ones you'll be migrating to Lucene would you?

          Show
          Hoss Man added a comment - I'm in favor ... i think once upon a time Yonik held off because he wasn't sure if he liked the API, but since it's been in Apache Solr for over a year now, i think it's safe. I don't suppose you'd be interested in opening a sister Solr issue and submitting a patch to deprecate those instances and make them subclass the ones you'll be migrating to Lucene would you?
          Hide
          Hoss Man added a comment -

          I just remembered one of the reasons why i didn't do this the last time i looked at it: i don't think FunctionQuery has any good unit tests in the Solr code base – there might be some tests that use the SOlrTestHarness to trigger function queries, but they aren't really portable.

          Show
          Hoss Man added a comment - I just remembered one of the reasons why i didn't do this the last time i looked at it: i don't think FunctionQuery has any good unit tests in the Solr code base – there might be some tests that use the SOlrTestHarness to trigger function queries, but they aren't really portable.
          Hide
          Yonik Seeley added a comment -

          > i think once upon a time Yonik held off because he wasn't sure if he liked the API

          Right... it's just never been at the top of my list to revisit.

          The main thing I was wondering is if I should have a whole ValueSource thing... perhaps FunctionQuery should be able to use other Queries directly. For example, one could have
          MultiplyFunctionQuery(MyNormalQuery, MyFieldFunctionQuery) to boost a query by another query (in this case a function query).

          Right now, increasing the score of a document based on a field value is done in an additive way by adding a FunctionQuery clause to a BooleanQuery. One could create a ValueSource that wraps another query to get a multiplicative effect, but is that the simplest approach?

          Show
          Yonik Seeley added a comment - > i think once upon a time Yonik held off because he wasn't sure if he liked the API Right... it's just never been at the top of my list to revisit. The main thing I was wondering is if I should have a whole ValueSource thing... perhaps FunctionQuery should be able to use other Queries directly. For example, one could have MultiplyFunctionQuery(MyNormalQuery, MyFieldFunctionQuery) to boost a query by another query (in this case a function query). Right now, increasing the score of a document based on a field value is done in an additive way by adding a FunctionQuery clause to a BooleanQuery. One could create a ValueSource that wraps another query to get a multiplicative effect, but is that the simplest approach?
          Hide
          Mike Klaas added a comment -

          I've often wanted to multiply the scores of two queries. I looked at FunctionQuery but didn't really see an easy way of getting around the ValueSource thing.

          See LUCENE-850 for my eventual solution

          Show
          Mike Klaas added a comment - I've often wanted to multiply the scores of two queries. I looked at FunctionQuery but didn't really see an easy way of getting around the ValueSource thing. See LUCENE-850 for my eventual solution
          Hide
          Doron Cohen added a comment -

          I intend to take a shot at this, with the approach of two parts/steps -
          1) simple scoring based on values of stored field.
          2) composing a document score as (some / math / extensible) function of one or more scores of sub queries.

          Thinking of a new package: o.a.l.search.function.

          This would seem to bring together LUCENE-446 and LUCENE-850 and I think would be handy for trying various scoring techniques.

          (Background/motivation: I was considering using payloads for trying some static scoring alternatives (e.g. link info based), but I realized that function queries are much more suitable for this, and would be a handy addition to Lucene core.)

          Show
          Doron Cohen added a comment - I intend to take a shot at this, with the approach of two parts/steps - 1) simple scoring based on values of stored field. 2) composing a document score as (some / math / extensible) function of one or more scores of sub queries. Thinking of a new package: o.a.l.search.function. This would seem to bring together LUCENE-446 and LUCENE-850 and I think would be handy for trying various scoring techniques. (Background/motivation: I was considering using payloads for trying some static scoring alternatives (e.g. link info based), but I realized that function queries are much more suitable for this, and would be a handy addition to Lucene core.)
          Hide
          Doron Cohen added a comment -

          Attached function.patch.txt adds three new queries:

          1. ValueSourceQuery - an Expert type of query, more or less same as
          in original patch. It is very flexible - takes a ValueSource as input - so it
          could be extended to do additional things (ie not only indexed fields).

          2. FieldScoreQuery - subclass of ValueSourceQuery. It is easier
          to use, and operates on cached indexed field. A doc score is set
          by the value of that field. There are 4 field parser types for this: float, int,
          short, and byte. They require different size in RAM when cached: 8, 4, 2,
          and 1 bytes respectively per document. The cache was modified to
          accommodate this. (Seems worth to save RAM where possible.)

          3. CustomScoreQuery - this query allows to custom the score of its contained
          sub-query by implementing a customScore() function. Any computation is
          possible, as long as it is based on the original score of the sub-query,
          the (optional) score of an (optional) sub-valueSourceQuery, and the docid.
          This query also covers (somewhat differently) LUCENE-850

          The patch Included tests and javadocs.
          All tests pass.

          I will later put the javadocs somewhere, to allow commenting on the API without
          applying the patch.

          The tests found quite a few bugs for me, and I hope I got the scorers and weight
          correct now - I would very much appreciate review comments on these delicate
          parts.,,

          Show
          Doron Cohen added a comment - Attached function.patch.txt adds three new queries: 1. ValueSourceQuery - an Expert type of query, more or less same as in original patch. It is very flexible - takes a ValueSource as input - so it could be extended to do additional things (ie not only indexed fields). 2. FieldScoreQuery - subclass of ValueSourceQuery. It is easier to use, and operates on cached indexed field. A doc score is set by the value of that field. There are 4 field parser types for this: float, int, short, and byte. They require different size in RAM when cached: 8, 4, 2, and 1 bytes respectively per document. The cache was modified to accommodate this. (Seems worth to save RAM where possible.) 3. CustomScoreQuery - this query allows to custom the score of its contained sub-query by implementing a customScore() function. Any computation is possible, as long as it is based on the original score of the sub-query, the (optional) score of an (optional) sub-valueSourceQuery, and the docid. This query also covers (somewhat differently) LUCENE-850 The patch Included tests and javadocs. All tests pass. I will later put the javadocs somewhere, to allow commenting on the API without applying the patch. The tests found quite a few bugs for me, and I hope I got the scorers and weight correct now - I would very much appreciate review comments on these delicate parts.,,
          Hide
          Doron Cohen added a comment -

          Modifying the issue name to reflect its current content.

          Show
          Doron Cohen added a comment - Modifying the issue name to reflect its current content.
          Hide
          Doron Cohen added a comment -

          javadocs for the new org.apache.lucene.search.function package
          can now be reviewed at http://people.apache.org/~doronc/api

          Show
          Doron Cohen added a comment - javadocs for the new org.apache.lucene.search.function package can now be reviewed at http://people.apache.org/~doronc/api
          Hide
          Doron Cohen added a comment -

          Updated patch to current trunk.

          Also:

          • moved TYPE consts in FieldScoreQuery to FieldScoreQuery.Type (e.g. FieldScoreQuery.Type.BYTE).
          • some documentation fixes.

          Updated patch javadocs in http://people.apache.org/~doronc/api/

          Show
          Doron Cohen added a comment - Updated patch to current trunk. Also: moved TYPE consts in FieldScoreQuery to FieldScoreQuery.Type (e.g. FieldScoreQuery.Type.BYTE). some documentation fixes. Updated patch javadocs in http://people.apache.org/~doronc/api/
          Hide
          Doron Cohen added a comment -

          Yonik (and other Solr's search.function people),

          I omitted some of the original functions/sources that were in your code:

          • LinearFloatFunction, MaxFloatFunction, ReciprocalFloatFunction,
          • OrdFieldSource, ReverseOrdFieldSource

          The first 3 should be straightforward to implemented by extending CustomScoreQuery, like the code samples show. Do you think such implementations should be included, ready to use?

          The last 2 Ord ones can be implemented as before, i.e. with the "expert" class ValueSource that was kept. But they seemed spooky to me, with that comment regarding multi-searchers. Are these just examples, or are they really useful? Do you think they should be included?

          Thanks,
          Doron

          Show
          Doron Cohen added a comment - Yonik (and other Solr's search.function people), I omitted some of the original functions/sources that were in your code: LinearFloatFunction, MaxFloatFunction, ReciprocalFloatFunction, OrdFieldSource, ReverseOrdFieldSource The first 3 should be straightforward to implemented by extending CustomScoreQuery, like the code samples show. Do you think such implementations should be included, ready to use? The last 2 Ord ones can be implemented as before, i.e. with the "expert" class ValueSource that was kept. But they seemed spooky to me, with that comment regarding multi-searchers. Are these just examples, or are they really useful? Do you think they should be included? Thanks, Doron
          Hide
          Hoss Man added a comment -

          Doron: I haven't really been able to keep up with the way this issue has evolved, or dig into your new patches, but to answer your question about the Ord functions: yes they are very useful, and it active use in Solr. I believe the warning about MultiSearcher mainly has to do with the fact that the MultiSearcher/FieldCache APIs give us know way to know the "lowest" of "highest" value in a field cache across an entire logical index, so the Ord functions can't really be queried against a MultiSearcher.

          Show
          Hoss Man added a comment - Doron: I haven't really been able to keep up with the way this issue has evolved, or dig into your new patches, but to answer your question about the Ord functions: yes they are very useful, and it active use in Solr. I believe the warning about MultiSearcher mainly has to do with the fact that the MultiSearcher/FieldCache APIs give us know way to know the "lowest" of "highest" value in a field cache across an entire logical index, so the Ord functions can't really be queried against a MultiSearcher.
          Hide
          Doron Cohen added a comment -

          ok, so I will add in the two ord classes in, so that Solr can move to use this package.

          Show
          Doron Cohen added a comment - ok, so I will add in the two ord classes in, so that Solr can move to use this package.
          Hide
          Doron Cohen added a comment -

          Updated patch:

          • fixes explanation and toString() issues.
          • adds the Ord and ReverseOrd valueSource classes that are in use in Solr
          • warn in the javadocs from the experimental state of this package

          Javadocs were updated at http://people.apache.org/~doronc/api

          I will commit this later today of there are no objections.

          Show
          Doron Cohen added a comment - Updated patch: fixes explanation and toString() issues. adds the Ord and ReverseOrd valueSource classes that are in use in Solr warn in the javadocs from the experimental state of this package Javadocs were updated at http://people.apache.org/~doronc/api I will commit this later today of there are no objections.
          Hide
          Doron Cohen added a comment -

          committed (experimental mode).

          Show
          Doron Cohen added a comment - committed (experimental mode).

            People

            • Assignee:
              Doron Cohen
              Reporter:
              Yonik Seeley
            • Votes:
              9 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development