Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2
    • Component/s: core/query/scoring
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In lucene 5, PayloadTermQuery used a hardcoded default of 1.0 for terms without a payload. The replacing PayloadScoreQuery in lucene 6 just ignores those terms. This is unflexible and wrong for many use cases (for example using Payloads to deemphasize some terms, where terms without payload should result in maximum score instead of being ignored).

      In my pull request I defer the decision on what to do with missing payloads to the scorePayload method of the similarity, which has to check the given payload for null and handle that case. I believe this breaks backwards compatibility?

        Issue Links

          Activity

          Hide
          xabbu42 Nathan Gass added a comment -

          With LUCENE-8038 you can provide your own default using the new PayloadDecoder interface.

          Show
          xabbu42 Nathan Gass added a comment - With LUCENE-8038 you can provide your own default using the new PayloadDecoder interface.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xabbu42 closed the pull request at:

          https://github.com/apache/lucene-solr/pull/167

          Show
          githubbot ASF GitHub Bot added a comment - Github user xabbu42 closed the pull request at: https://github.com/apache/lucene-solr/pull/167
          Hide
          xabbu42 Nathan Gass added a comment - - edited

          Couldn't this be done by returning a payload score that is less than 1 for terms that have a payload?

          The problem is not the downgraded token, but mixing downgraded and normal tokens without payloads. In lucene 5, the normal tokens got a value of 1.0. In lucene 6 it gets ignored. So using delimited_payload_filter in elasticsearch 5 and indexing 'foo|0.5 foo', a PayloadScoreQuery will use 0.5 as weight. In this use case and depending on the PayloadFunction 1.0 or 0.75 would be more appropriate.

          Show
          xabbu42 Nathan Gass added a comment - - edited Couldn't this be done by returning a payload score that is less than 1 for terms that have a payload? The problem is not the downgraded token, but mixing downgraded and normal tokens without payloads. In lucene 5, the normal tokens got a value of 1.0. In lucene 6 it gets ignored. So using delimited_payload_filter in elasticsearch 5 and indexing 'foo|0.5 foo', a PayloadScoreQuery will use 0.5 as weight. In this use case and depending on the PayloadFunction 1.0 or 0.75 would be more appropriate.
          Hide
          jpountz Adrien Grand added a comment -

          for example using Payloads to deemphasize some terms, where terms without payload should result in maximum score instead of being ignored

          Couldn't this be done by returning a payload score that is less than 1 for terms that have a payload?

          Show
          jpountz Adrien Grand added a comment - for example using Payloads to deemphasize some terms, where terms without payload should result in maximum score instead of being ignored Couldn't this be done by returning a payload score that is less than 1 for terms that have a payload?
          Hide
          dsmiley David Smiley added a comment -

          I haven't dug into this at all but from what you're saying, this change makes sense to me FWIW.

          Show
          dsmiley David Smiley added a comment - I haven't dug into this at all but from what you're saying, this change makes sense to me FWIW.
          Hide
          xabbu42 Nathan Gass added a comment -

          Ping

          I still think adding some flexibility on how to handle tokens without payload would be helpful. I'm also willing to try to implement a different approach but I need some input on what the correct approach would be.

          One possibility is to let the PayloadFunction also handle tokens without payload. This way could be completely backwards compatible and even more flexible.

          On the other hand, I have seen code examples for scorePayload which test for null. So perhaps at one time this was possible, or there are other ways this function could be called with null even now.

          Show
          xabbu42 Nathan Gass added a comment - Ping I still think adding some flexibility on how to handle tokens without payload would be helpful. I'm also willing to try to implement a different approach but I need some input on what the correct approach would be. One possibility is to let the PayloadFunction also handle tokens without payload. This way could be completely backwards compatible and even more flexible. On the other hand, I have seen code examples for scorePayload which test for null. So perhaps at one time this was possible, or there are other ways this function could be called with null even now.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user xabbu42 opened a pull request:

          https://github.com/apache/lucene-solr/pull/167

          LUCENE-7744

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/xabbu42/lucene-solr defaultpayload

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/167.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #167


          commit fa0b3902b0cfe370bbbb259dca9b47723981bdb1
          Author: Nathan Gass <gass@search.ch>
          Date: 2017-03-13T13:17:34Z

          let scorePayload provide a default for terms without payload


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user xabbu42 opened a pull request: https://github.com/apache/lucene-solr/pull/167 LUCENE-7744 You can merge this pull request into a Git repository by running: $ git pull https://github.com/xabbu42/lucene-solr defaultpayload Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #167 commit fa0b3902b0cfe370bbbb259dca9b47723981bdb1 Author: Nathan Gass <gass@search.ch> Date: 2017-03-13T13:17:34Z let scorePayload provide a default for terms without payload

            People

            • Assignee:
              ehatcher Erik Hatcher
              Reporter:
              xabbu42 Nathan Gass
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development