Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      PayloadScoreQuery is the only place that currently uses SimScorer.computePayloadFactor(), and as discussed on LUCENE-8014, this seems like the wrong place for it. We should instead add a PayloadDecoder abstraction that is passed to PayloadScoreQuery.

      1. LUCENE-8038.patch
        27 kB
        Alan Woodward
      2. LUCENE-8038-master.patch
        25 kB
        Alan Woodward

        Issue Links

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a744654bcae0b71232f009297d590a06574ce434 in lucene-solr's branch refs/heads/master from Alan Woodward
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a744654 ]

          LUCENE-8038: Remove deprecated PayloadScoreQuery methods

          Show
          jira-bot ASF subversion and git services added a comment - Commit a744654bcae0b71232f009297d590a06574ce434 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a744654 ] LUCENE-8038 : Remove deprecated PayloadScoreQuery methods
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5c9bcc9e900de027931a86704a8ab5fd4c525d9f in lucene-solr's branch refs/heads/master from Alan Woodward
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c9bcc9 ]

          LUCENE-8038: Add PayloadDecoder

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5c9bcc9e900de027931a86704a8ab5fd4c525d9f in lucene-solr's branch refs/heads/master from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c9bcc9 ] LUCENE-8038 : Add PayloadDecoder
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 44bd8e4d7922b3233c7db6cc435e95959a0bc1ee in lucene-solr's branch refs/heads/branch_7x from Alan Woodward
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=44bd8e4 ]

          LUCENE-8038: Add PayloadDecoder

          Show
          jira-bot ASF subversion and git services added a comment - Commit 44bd8e4d7922b3233c7db6cc435e95959a0bc1ee in lucene-solr's branch refs/heads/branch_7x from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=44bd8e4 ] LUCENE-8038 : Add PayloadDecoder
          Hide
          romseygeek Alan Woodward added a comment -

          PayloadScoreQuery just multiplies the score returned by the Similarity by a payload factor, so it shouldn't interfere with any of the restrictions above. For more flexibility, it might be worth investigating a DoubleValuesSource that exposes payload factors that you could then use in e.g. an Expression. You'd then use the Expression as a sort field, which would bypass LUCENE-4100 entirely. That's for later though, I think the patch here is ready?

          Show
          romseygeek Alan Woodward added a comment - PayloadScoreQuery just multiplies the score returned by the Similarity by a payload factor, so it shouldn't interfere with any of the restrictions above. For more flexibility, it might be worth investigating a DoubleValuesSource that exposes payload factors that you could then use in e.g. an Expression. You'd then use the Expression as a sort field, which would bypass LUCENE-4100 entirely. That's for later though, I think the patch here is ready?
          Hide
          rcmuir Robert Muir added a comment -

          They need to be restricted though, thats why we opened this issue, because we are working on fixing similarity impls to have well-defined behavior, so that we can have these optimizations.

          For example a similarity should never be asked to compute tf(0), it will result in divide by zero with most scoring systems. Its not something we should have to guard against with conditionals everywhere, it should instead be avoided.

          Show
          rcmuir Robert Muir added a comment - They need to be restricted though, thats why we opened this issue, because we are working on fixing similarity impls to have well-defined behavior, so that we can have these optimizations. For example a similarity should never be asked to compute tf(0), it will result in divide by zero with most scoring systems. Its not something we should have to guard against with conditionals everywhere, it should instead be avoided.
          Hide
          xabbu42 Nathan Gass added a comment -

          I want to urge caution about adding more flexibility. There is a huge benefit in restricting what this can do: it can allow us to potentially implement stuff like LUCENE-4100 and LUCENE-7993 for span queries.

          Just for the record: As the calculated floats are not restricted anyway I don't think optimizations like this are possible/impossible depending on that sort of flexibility. You have/can add the exact same rules and constraints both ways.

          Show
          xabbu42 Nathan Gass added a comment - I want to urge caution about adding more flexibility. There is a huge benefit in restricting what this can do: it can allow us to potentially implement stuff like LUCENE-4100 and LUCENE-7993 for span queries. Just for the record: As the calculated floats are not restricted anyway I don't think optimizations like this are possible/impossible depending on that sort of flexibility. You have/can add the exact same rules and constraints both ways.
          Hide
          romseygeek Alan Woodward added a comment -

          I think the extra flexibility could be limited to just PayloadScoreQuery, but I agree, let's just fix the API on this issue.

          Show
          romseygeek Alan Woodward added a comment - I think the extra flexibility could be limited to just PayloadScoreQuery, but I agree, let's just fix the API on this issue.
          Hide
          rcmuir Robert Muir added a comment -

          I want to urge caution about adding more flexibility. There is a huge benefit in restricting what this can do: it can allow us to potentially implement stuff like LUCENE-4100 and LUCENE-7993 for span queries.

          But if we make them "ultra-flexible crap" with no rules or constraints then these queries will just remain slow compared to e.g. phrase queries.

          Can we just fix the API to be better on this issue without messing around with any scoring of null or flexibility or anything else?

          Show
          rcmuir Robert Muir added a comment - I want to urge caution about adding more flexibility. There is a huge benefit in restricting what this can do: it can allow us to potentially implement stuff like LUCENE-4100 and LUCENE-7993 for span queries. But if we make them "ultra-flexible crap" with no rules or constraints then these queries will just remain slow compared to e.g. phrase queries. Can we just fix the API to be better on this issue without messing around with any scoring of null or flexibility or anything else?
          Hide
          romseygeek Alan Woodward added a comment -

          It might be useful to let PayloadDecoder return null for missing payloads

          I think I'd rather go back to the lucene 5.0 behaviour of returning 1.0 - this is all being done within the scoring loop, so we want to keep boxing and object creation to an absolute minimum.

          But what is the story on more complex custom payload functions?

          There's definitely more to do here! But yes, let's defer that to a later issue. This one is mainly to enable LUCENE-8014.

          Show
          romseygeek Alan Woodward added a comment - It might be useful to let PayloadDecoder return null for missing payloads I think I'd rather go back to the lucene 5.0 behaviour of returning 1.0 - this is all being done within the scoring loop, so we want to keep boxing and object creation to an absolute minimum. But what is the story on more complex custom payload functions? There's definitely more to do here! But yes, let's defer that to a later issue. This one is mainly to enable LUCENE-8014 .
          Hide
          xabbu42 Nathan Gass added a comment -

          It might be feature creep for the current issue. But what is the story on more complex custom payload functions?

          To use PayloadScoreQuery, I necessarily have to decode the payload to a single float and use that in the custom PayloadFunction. Lets say I have not only floats as payloads but also for example token types. And my final scoring factor could be for example the average of the most important type for which the token in question exists. Currently I would need to do some dirty hacks encoding type and score in a single float. But PayloadScoreQuery never actually uses the value and just passes it on to the PayloadFunction. So if we deprecate the old way anyway, the new one could be even more flexible?

          That said that patch completely solves my current use case and the described more complex one is a fabrication for the sake of an example.

          Show
          xabbu42 Nathan Gass added a comment - It might be feature creep for the current issue. But what is the story on more complex custom payload functions? To use PayloadScoreQuery, I necessarily have to decode the payload to a single float and use that in the custom PayloadFunction. Lets say I have not only floats as payloads but also for example token types. And my final scoring factor could be for example the average of the most important type for which the token in question exists. Currently I would need to do some dirty hacks encoding type and score in a single float. But PayloadScoreQuery never actually uses the value and just passes it on to the PayloadFunction. So if we deprecate the old way anyway, the new one could be even more flexible? That said that patch completely solves my current use case and the described more complex one is a fabrication for the sake of an example.
          Hide
          xabbu42 Nathan Gass added a comment -

          Yes this also fixes LUCENE-7744.

          As I read the code however, SimilarityPayloadDecoder is not fully backwards compatible. The old code ignored tokens without payload for scoring, the new one treats them as having payload 0 (So a MinPayloadFunction for the document "foo|0.5 foo" would newly return 0 instead of the current 0.5 (or 1.0 in lucene 5).

          It might be useful to let PayloadDecoder return null for missing payloads. This way its possible to get any of the three semantics with a custom PayloadDecoder and default PayloadFunctions.

          Show
          xabbu42 Nathan Gass added a comment - Yes this also fixes LUCENE-7744 . As I read the code however, SimilarityPayloadDecoder is not fully backwards compatible. The old code ignored tokens without payload for scoring, the new one treats them as having payload 0 (So a MinPayloadFunction for the document "foo|0.5 foo" would newly return 0 instead of the current 0.5 (or 1.0 in lucene 5). It might be useful to let PayloadDecoder return null for missing payloads. This way its possible to get any of the three semantics with a custom PayloadDecoder and default PayloadFunctions.
          Hide
          romseygeek Alan Woodward added a comment -

          And here's a patch for master, removing the deprecated methods.

          Christine PoerschkeErik Hatcher This has knock-on effects on the xml queryparser and Solr's handling PayloadScoreQParserPlugin that you might want to comment on/fix

          Show
          romseygeek Alan Woodward added a comment - And here's a patch for master, removing the deprecated methods. Christine Poerschke Erik Hatcher This has knock-on effects on the xml queryparser and Solr's handling PayloadScoreQParserPlugin that you might want to comment on/fix
          Hide
          romseygeek Alan Woodward added a comment -

          Here is a patch for master/7x.

          If no PayloadDecoder is passed to PayloadScoreQuery, then it delegates on to SimScorer as before. The constructors with no PayloadDecoder are deprecated.

          Nathan Gass this should fix LUCENE-7744 as well, I think?

          Show
          romseygeek Alan Woodward added a comment - Here is a patch for master/7x. If no PayloadDecoder is passed to PayloadScoreQuery, then it delegates on to SimScorer as before. The constructors with no PayloadDecoder are deprecated. Nathan Gass this should fix LUCENE-7744 as well, I think?

            People

            • Assignee:
              romseygeek Alan Woodward
              Reporter:
              romseygeek Alan Woodward
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development