Lucene - Core
  1. Lucene - Core
  2. LUCENE-1019

CustomScoreQuery should support multiple ValueSourceQueries

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      CustomScoreQuery's constructor currently accepts a subQuery, and a ValueSourceQuery. I would like it to accept multiple ValueSourceQueries. The workaround of nested CustomScoreQueries works for simple cases, but it quickly becomes either cumbersome to manage, or impossible to implement the desired function.

      This patch implements CustomMultiScoreQuery with my desired functionality, and refactors CustomScoreQuery to implement the special case of a CustomMultiScoreQuery with 0 or 1 ValueSourceQueries. This keeps the CustomScoreQuery API intact.

      This patch includes basic tests, more or less taken from the original implementation, and customized a bit to cover the new cases.

      1. CustomMultiQuery.v0.diff
        37 kB
        Kyle Maxwell
      2. CustomScoreQuery.v1.diff
        13 kB
        Kyle Maxwell
      3. lucene-1019-multi-vsq.patch
        25 kB
        Doron Cohen

        Issue Links

          Activity

          Hide
          Doron Cohen added a comment -

          Committed, thanks Kyle!

          Show
          Doron Cohen added a comment - Committed, thanks Kyle!
          Hide
          Doron Cohen added a comment -

          When working on this I hoped that Solr would move to
          use it, but by SOLR-192 it never happen. If someone in
          Solr is committed to do this I will definitely work on it
          (hoping there's nothing with the ne functionality in Solr
          that breaks with our changes so far). I will ask in Solr.

          Show
          Doron Cohen added a comment - When working on this I hoped that Solr would move to use it, but by SOLR-192 it never happen. If someone in Solr is committed to do this I will definitely work on it (hoping there's nothing with the ne functionality in Solr that breaks with our changes so far). I will ask in Solr.
          Hide
          Grant Ingersoll added a comment -

          Somewhat related, but any thoughts on some of the newer functionality in Solr? I really hate to see such a divergence.

          Show
          Grant Ingersoll added a comment - Somewhat related, but any thoughts on some of the newer functionality in Solr? I really hate to see such a divergence.
          Hide
          Doron Cohen added a comment -

          lucene-1019-multi-vsq.patch:

          • modified version2:
            • fixed hash() and equals()
            • added a multi vsq form of customExplain()
            • more documentation in customScore() and customExplain()
            • added test of multi vsq
          • fixed a bug in search.function tests

          All tests pass.
          I intend to commit this in a few days.

          Show
          Doron Cohen added a comment - lucene-1019-multi-vsq.patch: modified version2: fixed hash() and equals() added a multi vsq form of customExplain() more documentation in customScore() and customExplain() added test of multi vsq fixed a bug in search.function tests All tests pass. I intend to commit this in a few days.
          Hide
          Doron Cohen added a comment -
          • The way in which caching is handled is now unclear.

          For e.g. IntFieldSource caching is done in that level, so I am not sure
          I understand about what is unclear here.

          • Trying to get explain information from the sub-ValueSources was quite difficult.
          • There is much more code in my queries, leading to increased brittleness.

          Yes I agree about this part.
          I had the similar experience when combining field values, and decided to just live with that.
          In a glance, the v2 patch seems to solve this nicely so I will look into committing this.

          Thanks for bringing this up,
          Doron

          Show
          Doron Cohen added a comment - The way in which caching is handled is now unclear. For e.g. IntFieldSource caching is done in that level, so I am not sure I understand about what is unclear here. Trying to get explain information from the sub-ValueSources was quite difficult. There is much more code in my queries, leading to increased brittleness. Yes I agree about this part. I had the similar experience when combining field values, and decided to just live with that. In a glance, the v2 patch seems to solve this nicely so I will look into committing this. Thanks for bringing this up, Doron
          Hide
          Kyle Maxwell added a comment -

          Hi, after trying out the combined valuesource implementation suggested by Doron, I've found it to be extremely cumbersome and brittle in practice. Therefore, I am reopening this ticket.

          • Trying to get explain information from the sub-ValueSources was quite difficult.
          • There is much more code in my queries, leading to increased brittleness.
          • The way in which caching is handled is now unclear.

          Can this ticket please be reconsidered? Thanks!

          Show
          Kyle Maxwell added a comment - Hi, after trying out the combined valuesource implementation suggested by Doron, I've found it to be extremely cumbersome and brittle in practice. Therefore, I am reopening this ticket. Trying to get explain information from the sub-ValueSources was quite difficult. There is much more code in my queries, leading to increased brittleness. The way in which caching is handled is now unclear. Can this ticket please be reconsidered? Thanks!
          Hide
          Kyle Maxwell added a comment -

          Ok, I'm satisfied with Doron's solution. It'd be nice to see something like this in some documentation, somewhere. The wiki is prolly appropriate.

          Show
          Kyle Maxwell added a comment - Ok, I'm satisfied with Doron's solution. It'd be nice to see something like this in some documentation, somewhere. The wiki is prolly appropriate.
          Hide
          Doron Cohen added a comment -

          You could put this logic in your implementation of ValueSource,
          possibly constructed over multiple FieldCacheSources -

          DateDecayQuery over multiple value sources
          public class DateDecayQuery extends CustomScoreQuery {
          
            public DateDecayQuery(Query subQuery) {
              super(subQuery, createValSrceQuery());
              setStrict(true);
            }
          
            private static ValueSourceQuery createValSrceQuery() {
              return new ValueSourceQuery(new HalfLifeValSrc());
            }
            
            private static class HalfLifeValSrc extends ValueSource {
              final ValueSource createdAt  = new IntFieldSource("created-at");
              final ValueSource halfLife  = new IntFieldSource("half-life");
              final long now = new Date().getTime() / 1000; // UNIX timestamp;
              final double LOG2 = Math.log(2);
          
              public DocValues getValues(final IndexReader reader) throws IOException {
                final DocValues valsCreated = createdAt.getValues(reader);
                final DocValues valsHalfLife = halfLife.getValues(reader);
                return new DocValues(reader.maxDoc()) {
                  public float floatVal(int doc) {
                    float vCreated = valsCreated.floatVal(doc);
                    float vHalfLife = valsHalfLife.floatVal(doc);
                    return (float) Math.exp(LOG2 * (vCreated - now) / vHalfLife);
                  }
                };
              }
            }
          }
          

          Though usage is much simpler if this is added to the query.

          Show
          Doron Cohen added a comment - You could put this logic in your implementation of ValueSource, possibly constructed over multiple FieldCacheSources - DateDecayQuery over multiple value sources public class DateDecayQuery extends CustomScoreQuery { public DateDecayQuery(Query subQuery) { super (subQuery, createValSrceQuery()); setStrict( true ); } private static ValueSourceQuery createValSrceQuery() { return new ValueSourceQuery( new HalfLifeValSrc()); } private static class HalfLifeValSrc extends ValueSource { final ValueSource createdAt = new IntFieldSource( "created-at" ); final ValueSource halfLife = new IntFieldSource( "half-life" ); final long now = new Date().getTime() / 1000; // UNIX timestamp; final double LOG2 = Math .log(2); public DocValues getValues( final IndexReader reader) throws IOException { final DocValues valsCreated = createdAt.getValues(reader); final DocValues valsHalfLife = halfLife.getValues(reader); return new DocValues(reader.maxDoc()) { public float floatVal( int doc) { float vCreated = valsCreated.floatVal(doc); float vHalfLife = valsHalfLife.floatVal(doc); return ( float ) Math .exp(LOG2 * (vCreated - now) / vHalfLife); } }; } } } Though usage is much simpler if this is added to the query.
          Hide
          Kyle Maxwell added a comment - - edited

          Here's a slightly simpler version of the diff (v1).

          The default behavior of CustomScoreQuery with multiple ValueSourceQueries does not matter to me. I really want to be able to override it with custom logic. Also note that multiplying twice is currently as simple as CustomScoreQuery(CustomScoreQuery(subQuery, value1), value2). But what about things that aren't linear combinations?

          Use case: I want the score to fall off exponentially as content ages, with a decay rate that varies on a per document basis.

          Each document has three fields: "text," "created-at," and "half-life." Created-at is represented as a UNIX timestamp, and half-life in seconds. I'm not sure that the following query is able to be expressed as nested queries. There may be another way to do this, but this seems simple and elegant to me.

          public class DateDecayQuery extends CustomScoreQuery {
          	public final double LOG2 = Math.log(2);
          	private long now;
          
          	public DateDecayQuery(Query subQuery) {
          		super(subQuery, new ValueSourceQuery[] {
          				new FieldScoreQuery("created-at", Type.INT),
          				new FieldScoreQuery("half-life", Type.INT) });
          		now = new Date().getTime() / 1000; // UNIX timestamp;
          		setStrict(true);
          	}
          
          	public float customScore(int doc, float score, float fields[]) {
          		float date = fields[0];
          		float halfLife = fields[1];
          		float dateScore = (float) Math.exp(LOG2 * (date - now) / halfLife);
          		return score * dateScore;
          	}
          }
          
          Show
          Kyle Maxwell added a comment - - edited Here's a slightly simpler version of the diff (v1). The default behavior of CustomScoreQuery with multiple ValueSourceQueries does not matter to me. I really want to be able to override it with custom logic. Also note that multiplying twice is currently as simple as CustomScoreQuery(CustomScoreQuery(subQuery, value1), value2). But what about things that aren't linear combinations? Use case: I want the score to fall off exponentially as content ages, with a decay rate that varies on a per document basis. Each document has three fields: "text," "created-at," and "half-life." Created-at is represented as a UNIX timestamp, and half-life in seconds. I'm not sure that the following query is able to be expressed as nested queries. There may be another way to do this, but this seems simple and elegant to me. public class DateDecayQuery extends CustomScoreQuery { public final double LOG2 = Math .log(2); private long now; public DateDecayQuery(Query subQuery) { super (subQuery, new ValueSourceQuery[] { new FieldScoreQuery( "created-at" , Type.INT), new FieldScoreQuery( "half-life" , Type.INT) }); now = new Date().getTime() / 1000; // UNIX timestamp; setStrict( true ); } public float customScore( int doc, float score, float fields[]) { float date = fields[0]; float halfLife = fields[1]; float dateScore = ( float ) Math .exp(LOG2 * (date - now) / halfLife); return score * dateScore; } }
          Hide
          Hoss Man added a comment -

          this class seems to assume that the ValueSourceQueries should be multipled ... but it would be just as easy to assume the should be added, or averaged.

          It seems like it might make more sense if instead of a CustomMultiScoreQuery there was just a "ProductValueSource" class that took in a ValueSource[] and multiplied them

          Show
          Hoss Man added a comment - this class seems to assume that the ValueSourceQueries should be multipled ... but it would be just as easy to assume the should be added, or averaged. It seems like it might make more sense if instead of a CustomMultiScoreQuery there was just a "ProductValueSource" class that took in a ValueSource[] and multiplied them
          Hide
          Kyle Maxwell added a comment - - edited

          Here's the patch! BTW, I'll edit the docs as soon as someone signs off that this is a good idea!

          Show
          Kyle Maxwell added a comment - - edited Here's the patch! BTW, I'll edit the docs as soon as someone signs off that this is a good idea!

            People

            • Assignee:
              Doron Cohen
              Reporter:
              Kyle Maxwell
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development