Lucene - Core
  1. Lucene - Core
  2. LUCENE-1790

Add Boosting Function Term Query and Some Payload Query refactorings

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None

      Description

      Similar to the BoostingTermQuery, the BoostingFunctionTermQuery is a SpanTermQuery, but the difference is the payload score for a doc is not the average of all the payloads, but applies a function to them instead. BoostingTermQuery becomes a BoostingFunctionTermQuery with an AveragePayloadFunction applied to it.

      Also add marker interface to indicate PayloadQuery types. Refactor Similarity.scorePayload to also take in the doc id.

      1. LUCENE-1790.patch
        10 kB
        Mark Miller
      2. LUCENE-1790.patch
        42 kB
        Grant Ingersoll
      3. LUCENE-1790.patch
        37 kB
        Grant Ingersoll
      4. LUCENE-1790.patch
        17 kB
        Grant Ingersoll
      5. LUCENE-1790-position.patch
        12 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          What about a common class with chooseable aggregation method?

          Show
          Mark Miller added a comment - What about a common class with chooseable aggregation method?
          Hide
          Grant Ingersoll added a comment -

          Will commit tomorrow or Saturday, as it is a pretty minor variant of the BoostingTermQuery

          Show
          Grant Ingersoll added a comment - Will commit tomorrow or Saturday, as it is a pretty minor variant of the BoostingTermQuery
          Hide
          Grant Ingersoll added a comment -

          Was actually just thinking we could have a simple Function Interface:

          public interface PayloadFunction{
             float currentScore(currentScore, currentPayloadScore);
          
             float finalScore(numPayloadsSeen, payloadScore);
          }
          

          and it could be passed into the constructor. Then, in processPayload, you could just have

                    payloadScore = function.score(payloadScore, similarity.scorePayload(term.field(), payload, 0, positions.getPayloadLength()));
          

          instead of

                    payloadScore = Math.max(payloadScore, similarity.scorePayload(term.field(), payload, 0, positions.getPayloadLength()));
          

          or whatever is there.

          Then, the getPayloadScore() method (see my patch) would be:

          return (payloadsSeen > 0 ? (function.finalScore(payloadsSeen, payloadScore)) : 1);
          

          What did you have in mind? I'm pretty tired, so the above may be a bit whacked.

          Show
          Grant Ingersoll added a comment - Was actually just thinking we could have a simple Function Interface: public interface PayloadFunction{ float currentScore(currentScore, currentPayloadScore); float finalScore(numPayloadsSeen, payloadScore); } and it could be passed into the constructor. Then, in processPayload, you could just have payloadScore = function.score(payloadScore, similarity.scorePayload(term.field(), payload, 0, positions.getPayloadLength())); instead of payloadScore = Math .max(payloadScore, similarity.scorePayload(term.field(), payload, 0, positions.getPayloadLength())); or whatever is there. Then, the getPayloadScore() method (see my patch) would be: return (payloadsSeen > 0 ? (function.finalScore(payloadsSeen, payloadScore)) : 1); What did you have in mind? I'm pretty tired, so the above may be a bit whacked.
          Hide
          Mark Miller added a comment -

          Yeah, thats basically what I was thinking - then you could do min score or whatever as well - .

          Does it really needs all of those args though? I guess you could possibly do more that way, but it almost seems you just need:

          public interface PayloadAggregationFunction{
             void aggregate(score);
             float score();
          }
          
          
          Show
          Mark Miller added a comment - Yeah, thats basically what I was thinking - then you could do min score or whatever as well - . Does it really needs all of those args though? I guess you could possibly do more that way, but it almost seems you just need: public interface PayloadAggregationFunction{ void aggregate(score); float score(); }
          Hide
          Grant Ingersoll added a comment -

          Refactors BoostingTermQuery to be a BoostingFunctionQuery. Adds in several PayloadFunction implementations. All tests pass

          Will commit today or tomorrow.

          Show
          Grant Ingersoll added a comment - Refactors BoostingTermQuery to be a BoostingFunctionQuery. Adds in several PayloadFunction implementations. All tests pass Will commit today or tomorrow.
          Hide
          Grant Ingersoll added a comment -

          Next take on this:

          1. Added includeSpanScore flag, which allows you to ignore the score from the TermQuery part of the score and only count the payload.

          2. Deprecated Similarity.scorePayload(String fieldName, ...) to a similar method that also takes in the Doc id. Now, in theory, you could have different scoring for payloads based on different documents, fields, etc. The old method just calls the new one and passes in a NO_DOC_ID_PROVIDED value (-1).

          3. Added a Marker Interface named PayloadQuery and marked the various PayloadQueries. This could be useful for Queries that work with other PayloadQueries (more exclusive than the fact that they are SpanQueries.

          I really do intend to commit this

          Show
          Grant Ingersoll added a comment - Next take on this: 1. Added includeSpanScore flag, which allows you to ignore the score from the TermQuery part of the score and only count the payload. 2. Deprecated Similarity.scorePayload(String fieldName, ...) to a similar method that also takes in the Doc id. Now, in theory, you could have different scoring for payloads based on different documents, fields, etc. The old method just calls the new one and passes in a NO_DOC_ID_PROVIDED value (-1). 3. Added a Marker Interface named PayloadQuery and marked the various PayloadQueries. This could be useful for Queries that work with other PayloadQueries (more exclusive than the fact that they are SpanQueries. I really do intend to commit this
          Hide
          Michael McCandless added a comment -

          Is this done?

          Show
          Michael McCandless added a comment - Is this done?
          Hide
          Michael McCandless added a comment -

          Should we deprecate BoostingTermQuery, with this?

          Show
          Michael McCandless added a comment - Should we deprecate BoostingTermQuery, with this?
          Hide
          Grant Ingersoll added a comment -

          Deprecated BoostingTermQuery and committed

          Show
          Grant Ingersoll added a comment - Deprecated BoostingTermQuery and committed
          Hide
          Grant Ingersoll added a comment - - edited

          Going to reopen and see about passing along the position information into both the new scorePayload() method and into the PayloadFunction, as it may be useful to know this information when scoring payloads.

          Show
          Grant Ingersoll added a comment - - edited Going to reopen and see about passing along the position information into both the new scorePayload() method and into the PayloadFunction, as it may be useful to know this information when scoring payloads.
          Hide
          Grant Ingersoll added a comment - - edited

          Pass in position information as well for scoring. Will commit tomorrow.

          Show
          Grant Ingersoll added a comment - - edited Pass in position information as well for scoring. Will commit tomorrow.
          Hide
          Grant Ingersoll added a comment -

          Committed revision 804178.

          Committed the position patch

          Show
          Grant Ingersoll added a comment - Committed revision 804178. Committed the position patch
          Hide
          Mark Miller added a comment -

          BoostingFunctionTermQuery implements equals but not hashcode - important for a query class I think.

          Show
          Mark Miller added a comment - BoostingFunctionTermQuery implements equals but not hashcode - important for a query class I think.
          Hide
          Mark Miller added a comment -

          remove some unused imports
          added missing license header

          Added hashCode to BoostingFunctionTermQuery

          Added hashCode/equals to PayloadFunction classes

          added hashcode/equals to query - really it should be handling the equals/hashcode for boost, not subclasses (which will be likely to forget it - you should check super classes for these things anyway as well).

          BoostingFunctionTermQuery is a subclass of SpanTermQuery, but both of them use a weak equals method (using instanceof)
          so while BoostingFunctionTermQuery.equals(SpanTermQuery) should equal SpanTermQuery.equals(BoostFunctionTermQuery), it doesn't.

          Added new hashCode/equals for both classes that work properly.

          Added a couple tests for these fixes

          Show
          Mark Miller added a comment - remove some unused imports added missing license header Added hashCode to BoostingFunctionTermQuery Added hashCode/equals to PayloadFunction classes added hashcode/equals to query - really it should be handling the equals/hashcode for boost, not subclasses (which will be likely to forget it - you should check super classes for these things anyway as well). BoostingFunctionTermQuery is a subclass of SpanTermQuery, but both of them use a weak equals method (using instanceof) so while BoostingFunctionTermQuery.equals(SpanTermQuery) should equal SpanTermQuery.equals(BoostFunctionTermQuery), it doesn't. Added new hashCode/equals for both classes that work properly. Added a couple tests for these fixes
          Hide
          Mark Miller added a comment -

          reopen to fix hashCode/equals

          Show
          Mark Miller added a comment - reopen to fix hashCode/equals
          Hide
          Grant Ingersoll added a comment -

          Looks good, Mark. +1 for you to commit. Good catch.

          Show
          Grant Ingersoll added a comment - Looks good, Mark. +1 for you to commit. Good catch.
          Hide
          Mark Miller added a comment -

          thanks for the review,

          r804994

          Show
          Mark Miller added a comment - thanks for the review, r804994

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development