Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Trunk
    • Component/s: search
    • Labels:
      None

      Description

      Solr currently has no support for Lucene's PayloadTermQuery, yet it has support for indexing payloads.

        Issue Links

          Activity

          Erik Hatcher created issue -
          Hide
          Erik Hatcher added a comment -

          This class adds a QParserPlugin to support creating PayloadTermQuery's.

          This can be registered in solrconfig.xml like this:

          <queryParser name="payload" class="org.apache.solr.search.PayloadTermQueryPlugin"/>

          A custom Similarity is needed to score payloads (not provided with this issue).

          Once everything is lined up right (payload indexed, similarity with scorePayload implemented), a query like this can be used:
          http://localhost:8983/solr/select?q=

          {!payload%20f=payloads%20func=avg}

          foo&debugQuery=true

          As can be seen with this explanation:
          1.4450715 = (MATCH) fieldWeight(payloads:foo in 0), product of:
          4.709331 = (MATCH) btq, product of:
          0.70710677 = tf(phraseFreq=0.5)
          6.66 = scorePayload(...)
          0.30685282 = idf(payloads: foo=1)
          1.0 = fieldNorm(field=payloads, doc=0)

          Show
          Erik Hatcher added a comment - This class adds a QParserPlugin to support creating PayloadTermQuery's. This can be registered in solrconfig.xml like this: <queryParser name="payload" class="org.apache.solr.search.PayloadTermQueryPlugin"/> A custom Similarity is needed to score payloads (not provided with this issue). Once everything is lined up right (payload indexed, similarity with scorePayload implemented), a query like this can be used: http://localhost:8983/solr/select?q= {!payload%20f=payloads%20func=avg} foo&debugQuery=true As can be seen with this explanation: 1.4450715 = (MATCH) fieldWeight(payloads:foo in 0), product of: 4.709331 = (MATCH) btq, product of: 0.70710677 = tf(phraseFreq=0.5) 6.66 = scorePayload(...) 0.30685282 = idf(payloads: foo=1) 1.0 = fieldNorm(field=payloads, doc=0)
          Erik Hatcher made changes -
          Field Original Value New Value
          Attachment PayloadTermQueryPlugin.java [ 12421139 ]
          Hide
          Bill Au added a comment -

          Eric, have you started on this? I recently wrote a QParserPlugin that supports PayloadTermQuery. It is very bear-bone but could be a good starting point. I can attach my code here to get things started.

          Show
          Bill Au added a comment - Eric, have you started on this? I recently wrote a QParserPlugin that supports PayloadTermQuery. It is very bear-bone but could be a good starting point. I can attach my code here to get things started.
          Hide
          Bill Au added a comment -

          Never mind. I just saw you update. Your code looks good.

          Show
          Bill Au added a comment - Never mind. I just saw you update. Your code looks good.
          Hide
          Bill Au added a comment -

          Eric, do you think we should support default field and default operator in the QParser used?

          Show
          Bill Au added a comment - Eric, do you think we should support default field and default operator in the QParser used?
          Hide
          Yonik Seeley added a comment -

          Moving out of 1.4 since this is a new feature that isn't ready to commit.
          As written, it looks more like "rawpayload" or something since no analysis is done on the input.

          Show
          Yonik Seeley added a comment - Moving out of 1.4 since this is a new feature that isn't ready to commit. As written, it looks more like "rawpayload" or something since no analysis is done on the input.
          Yonik Seeley made changes -
          Fix Version/s 1.4 [ 12313351 ]
          Hide
          Bill Au added a comment -

          I am +0 on including/excluding this from 1.4. FYI, Solr 1.4 already has a DelimitedPayloadTokenFilterFactory which uses the DelimitedPayloadTokenFIlter in Lucene. If we include this, I think we should also include a Similarity class for payload, either as part of this JIRA or a separate one.

          There is also a similar JIRA on query support:

          https://issues.apache.org/jira/browse/SOLR-1337

          Show
          Bill Au added a comment - I am +0 on including/excluding this from 1.4. FYI, Solr 1.4 already has a DelimitedPayloadTokenFilterFactory which uses the DelimitedPayloadTokenFIlter in Lucene. If we include this, I think we should also include a Similarity class for payload, either as part of this JIRA or a separate one. There is also a similar JIRA on query support: https://issues.apache.org/jira/browse/SOLR-1337
          Grant Ingersoll made changes -
          Link This issue relates to SOLR-1337 [ SOLR-1337 ]
          Hide
          david added a comment -

          Hi,
          What if I want to do a boolean query?
          like: payoladField:steve OR NonPayloadField:George ?

          Won't the payload plugin be used for all the query parts?

          Show
          david added a comment - Hi, What if I want to do a boolean query? like: payoladField:steve OR NonPayloadField:George ? Won't the payload plugin be used for all the query parts?
          Hide
          Lance Norskog added a comment -

          Julien Noche posted last August that he had to create a new query parser variant of dismax. I cannot find an example of a query string in his post.

          Using Payloads with DisMaxQParser in SOLR

          Use cases for a payload-based query:

          • a raw byte stream
          • a serialized Java String
          • a number
          • a boolean value in the payload
          • "is there a payload?"
          • boosting a document if the search term has a payload
            • the payload is a number (packed float) created by

          Most of these can be encoded into a payload. But there are no matching decoders.
          There is no code that pulls the payload and uses the data.

          Show
          Lance Norskog added a comment - Julien Noche posted last August that he had to create a new query parser variant of dismax. I cannot find an example of a query string in his post. Using Payloads with DisMaxQParser in SOLR Use cases for a payload-based query: a raw byte stream a serialized Java String a number a boolean value in the payload "is there a payload?" boosting a document if the search term has a payload the payload is a number (packed float) created by Most of these can be encoded into a payload. But there are no matching decoders. There is no code that pulls the payload and uses the data.
          Hide
          Erik Hatcher added a comment -

          Is there interest in rejuvenating this to get some form of a SpanTermQuery support into Solr? I'll take a stab at updating this to do like the

          {!term}

          query parser to factor in the field type and any needed analysis. Anything else?

          Perhaps for the dismax+payloads situation Lance mentioned, which will be a different issue altogether, we make the SolrQueryParser implementation used by (e)dismax pluggable that it uses, so that there can be a span-aware one?

          Show
          Erik Hatcher added a comment - Is there interest in rejuvenating this to get some form of a SpanTermQuery support into Solr? I'll take a stab at updating this to do like the {!term} query parser to factor in the field type and any needed analysis. Anything else? Perhaps for the dismax+payloads situation Lance mentioned, which will be a different issue altogether, we make the SolrQueryParser implementation used by (e)dismax pluggable that it uses, so that there can be a span-aware one?
          Hide
          Roland Deck added a comment -

          Hi
          I tried the PayloadTermQueryPlugin today.
          To get the scores as mentioned above I had to change the code a little.

          Here is the relevant code fragment:

          @Override
          public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {
          return new QParser(qstr, localParams, params, req) {
          public Query parse() throws ParseException

          { //rdeck: hint: lets try to set includeSpanCore to true. => Yes it works! (after having re-indexed all documents)! return new PayloadTermQuery( new Term(localParams.get(QueryParsing.F), localParams.get(QueryParsing.V)), createPayloadFunction(localParams.get("func")), true); //was originally false instead of true }

          };
          }

          with includeSpanCore = false, I get score = payload value
          with includeSpanCore = true, the payload takes part on the score calculation

          I have some questions left:

          1) Why is the PayloadTermQuery limited to just one field? Or will this change?
          2) How can I mix up queries containing parts which are payload dependent and others which aren't?

          Show
          Roland Deck added a comment - Hi I tried the PayloadTermQueryPlugin today. To get the scores as mentioned above I had to change the code a little. Here is the relevant code fragment: @Override public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { return new QParser(qstr, localParams, params, req) { public Query parse() throws ParseException { //rdeck: hint: lets try to set includeSpanCore to true. => Yes it works! (after having re-indexed all documents)! return new PayloadTermQuery( new Term(localParams.get(QueryParsing.F), localParams.get(QueryParsing.V)), createPayloadFunction(localParams.get("func")), true); //was originally false instead of true } }; } with includeSpanCore = false, I get score = payload value with includeSpanCore = true, the payload takes part on the score calculation I have some questions left: 1) Why is the PayloadTermQuery limited to just one field? Or will this change? 2) How can I mix up queries containing parts which are payload dependent and others which aren't?
          Hide
          Otis Gospodnetic added a comment -

          Erik Hatcher - not sure if you are watching SOLR-1337, so I'll write the same comment/Q here:

          My impression was that Span queries and Payloads are kind of pase in Luceneland.... no?
          If yes, should we Won't Fix this?

          Show
          Otis Gospodnetic added a comment - Erik Hatcher - not sure if you are watching SOLR-1337 , so I'll write the same comment/Q here: My impression was that Span queries and Payloads are kind of pase in Luceneland.... no? If yes, should we Won't Fix this?
          Hide
          Grant Ingersoll added a comment -

          I would say it would be good to support payloads, unless there is a better solution.

          Show
          Grant Ingersoll added a comment - I would say it would be good to support payloads, unless there is a better solution.
          Erik Hatcher made changes -
          Assignee Erik Hatcher [ ehatcher ]
          Erik Hatcher made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Hide
          Erik Hatcher added a comment -

          Anyone have thoughts on how best to implement the scorePayload() method in Solr? Should Solr have its own DefaultSimilarity subclass that implements it?

          It'd be great to at least get support for PayloadTermQuery in, such that it supports DelimitedPayloadTokenFilter created payloads. I suppose that means that scorePayload() will need to support at least float and integer decoding, based on introspecting the field type definition. What about "identity" encoding (throw an unsupported exception?)? Using a custom encoder would require a custom scorePayload(), and by default throw an exception on that too I presume.

          And what about other Similarity implementations and if/how to support those?

          I'm seeing that it's tough to put this into Solr in a general purpose way, but maybe we can at least get out of the box support for integer and float using the default similarity.

          Show
          Erik Hatcher added a comment - Anyone have thoughts on how best to implement the scorePayload() method in Solr? Should Solr have its own DefaultSimilarity subclass that implements it? It'd be great to at least get support for PayloadTermQuery in, such that it supports DelimitedPayloadTokenFilter created payloads. I suppose that means that scorePayload() will need to support at least float and integer decoding, based on introspecting the field type definition. What about "identity" encoding (throw an unsupported exception?)? Using a custom encoder would require a custom scorePayload(), and by default throw an exception on that too I presume. And what about other Similarity implementations and if/how to support those? I'm seeing that it's tough to put this into Solr in a general purpose way, but maybe we can at least get out of the box support for integer and float using the default similarity.
          Hide
          Erick Erickson added a comment -

          Happens that I put together an end-to-end example here: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/, and part of that is a discussion I had with Hossman about whether a query parser approach or a fieldType approach would be better. Turns out each supports different capabilities.

          Personally, I think a fieldType would be a good thing since it should "just work".

          FWIW

          Show
          Erick Erickson added a comment - Happens that I put together an end-to-end example here: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ , and part of that is a discussion I had with Hossman about whether a query parser approach or a fieldType approach would be better. Turns out each supports different capabilities. Personally, I think a fieldType would be a good thing since it should "just work". FWIW
          Hide
          Erik Hatcher added a comment -

          Is there a reason not to use SchemaSimilarityFactory as the default Similarity moving forward? Relying on that would be nice, it seems.

          Show
          Erik Hatcher added a comment - Is there a reason not to use SchemaSimilarityFactory as the default Similarity moving forward? Relying on that would be nice, it seems.
          Erik Hatcher made changes -
          Priority Major [ 3 ] Minor [ 4 ]

            People

            • Assignee:
              Erik Hatcher
              Reporter:
              Erik Hatcher
            • Votes:
              7 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development