Solr
  1. Solr
  2. SOLR-1093

A RequestHandler to run multiple queries in a batch

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None

      Description

      It is a common requirement that a single page requires to fire multiple queries .In cases where these queries are independent of each other. If there is a handler which can take in multiple queries , run them in paralll and send the response as one big chunk it would be useful
      Let us say the handler is MultiRequestHandler

      <requestHandler name="/multi" class="solr.MultiRequestHandler"/>
      

      Query Syntax

      The request must specify the no:of queries as count=n

      Each request parameter must be prefixed with a number which denotes the query index.optionally ,it may can also specify the handler name.

      example

      /multi?count=2&1.handler=/select&1.q=a:b&2.handler=/select&2.q=a:c
      

      default handler can be '/select' so the equivalent can be

       
      /multi?count=2&1.q=a:b&2.q=a:c
      

      The response

      The response will be a List<NamedList> where each NamedList will be a response to a query.

      1. SOLR-1093.patch
        10 kB
        Karthick Duraisamy Soundararaj

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Might also be useful if it handled "fallback queries" too.

          Show
          Grant Ingersoll added a comment - Might also be useful if it handled "fallback queries" too.
          Hide
          Noble Paul added a comment -

          fallback queries?

          Show
          Noble Paul added a comment - fallback queries?
          Hide
          Lance Norskog added a comment -

          When the query servers are saturated, or when doing data mining, doing multiple simultaneous queries just makes things worse. If you go with this design, please add an option to do things serially.

          The most general solution is to add a scripting request handler. You give any code you want as the script. This allows, for example, a follow-up query based on previous results.

          Show
          Lance Norskog added a comment - When the query servers are saturated, or when doing data mining, doing multiple simultaneous queries just makes things worse. If you go with this design, please add an option to do things serially. The most general solution is to add a scripting request handler. You give any code you want as the script. This allows, for example, a follow-up query based on previous results.
          Hide
          Noble Paul added a comment -

          a scripting request handler is beyond the scope of this issue.

          Show
          Noble Paul added a comment - a scripting request handler is beyond the scope of this issue.
          Hide
          Jan Høydahl added a comment -

          Parallel execution of multiple queries is just one use case in a family of many others, and I agree with Lance's post in the list that it would be better to make an extensible component.

          Other similar use cases often requested: multi source federation, factor in ad service, select sources based on query analysis, select sources based on results, non-solr sources, result modification based on content in result, query abstraction layer/templating

          The common goal is to make an abstraction layer on top of search sources which can handle search-close functionality and thus not need implement this in all the front-ends. Other products which try to fill this role are: FAST Unity, Comperio Front, Sesat (sesat.no)

          Perhaps the /multi req.handler could be the start of such a framework, where the first plugin to implement is the parallel queries use-case.

          To be able to handle a high count for "n" without hitting HTTP GET limitaions, and get a cleaner syntax for complex cases, the handler could accept the request as a POST. Pseudo post content, could be JSON or custom:
          <steps>
          <branch type="list">
          <src name="web">qt=dismax&q=$q&rows=10&facet=true&facet.fl=mimetype</src>
          <src name="google">q=$q</src>
          <src name="yp">q=category:$q^10 OR company:$q&rows=3</src>
          <src name="wp">q=$q&rows=3</src>
          <src name="ads">q=$q</src>
          </multi>
          </steps>

          The result list would then consist of five entries named web, yp, google, wp and ads.
          Each "branch" and "src" would be pre-defined in config, specifying the implementing class and any defaults. indeed, the whole POST could be pre-configured, only needing to supply a &steps= param to identify which "template" to choose, using $variables for q etc.
          The class implmenting "steps" simply calls each sub step in sequence, passing the request and response objects. This provides a simple framework for future extensions, pre- or post-processing.
          The class implementing "branch" of type "list" would spawn all sub queries as threads and include each source result in a list.
          Another implementation type of "branch" could merge (federate) results instead of stacking them.
          The class implementing a "src" would be a thin wrapper which simply dispatches the query to the Search RequestHandler. Other implementations of "src" could be wrappers for external engines like Google or ad servers.

          My intention is not to suggest a huge component, but consider if a smart interface design could enable very powerful extension possibility which will be useful in almost all portal type applications.

          Show
          Jan Høydahl added a comment - Parallel execution of multiple queries is just one use case in a family of many others, and I agree with Lance's post in the list that it would be better to make an extensible component. Other similar use cases often requested: multi source federation, factor in ad service, select sources based on query analysis, select sources based on results, non-solr sources, result modification based on content in result, query abstraction layer/templating The common goal is to make an abstraction layer on top of search sources which can handle search-close functionality and thus not need implement this in all the front-ends. Other products which try to fill this role are: FAST Unity, Comperio Front, Sesat (sesat.no) Perhaps the /multi req.handler could be the start of such a framework, where the first plugin to implement is the parallel queries use-case. To be able to handle a high count for "n" without hitting HTTP GET limitaions, and get a cleaner syntax for complex cases, the handler could accept the request as a POST. Pseudo post content, could be JSON or custom: <steps> <branch type="list"> <src name="web">qt=dismax&q=$q&rows=10&facet=true&facet.fl=mimetype</src> <src name="google">q=$q</src> <src name="yp">q=category:$q^10 OR company:$q&rows=3</src> <src name="wp">q=$q&rows=3</src> <src name="ads">q=$q</src> </multi> </steps> The result list would then consist of five entries named web, yp, google, wp and ads. Each "branch" and "src" would be pre-defined in config, specifying the implementing class and any defaults. indeed, the whole POST could be pre-configured, only needing to supply a &steps= param to identify which "template" to choose, using $variables for q etc. The class implmenting "steps" simply calls each sub step in sequence, passing the request and response objects. This provides a simple framework for future extensions, pre- or post-processing. The class implementing "branch" of type "list" would spawn all sub queries as threads and include each source result in a list. Another implementation type of "branch" could merge (federate) results instead of stacking them. The class implementing a "src" would be a thin wrapper which simply dispatches the query to the Search RequestHandler. Other implementations of "src" could be wrappers for external engines like Google or ad servers. My intention is not to suggest a huge component, but consider if a smart interface design could enable very powerful extension possibility which will be useful in almost all portal type applications.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          syed abdul kather added a comment -

          Please add this feauture

          Show
          syed abdul kather added a comment - Please add this feauture
          Hide
          Simon Willnauer added a comment -

          We got lots of votes on this issue, seems like we should take some action here! I will assign it and make sure it will be resolved rather sooner than later.

          Please add this feauture

          to get expectations right, we are working on releasing 3.2 soon and this one should not block it. I will work towards 3.3 here

          Show
          Simon Willnauer added a comment - We got lots of votes on this issue, seems like we should take some action here! I will assign it and make sure it will be resolved rather sooner than later. Please add this feauture to get expectations right, we are working on releasing 3.2 soon and this one should not block it. I will work towards 3.3 here
          Hide
          Martijn van Groningen added a comment -

          Currently with grouping one might be able to achieve something similar. Queries are not executed in parallel, but it is something you can already use in Solr 4.0.

          Just specify group.query parameter multiple times. E.g.
          group=true&group.query=brand:sumsung&group.query=category:phones

          Show
          Martijn van Groningen added a comment - Currently with grouping one might be able to achieve something similar. Queries are not executed in parallel, but it is something you can already use in Solr 4.0. Just specify group.query parameter multiple times. E.g. group=true&group.query=brand:sumsung&group.query=category:phones
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Santthosh added a comment -

          Yup, I am also looking for something like this...

          Show
          Santthosh added a comment - Yup, I am also looking for something like this...
          Hide
          Jan Høydahl added a comment -

          Anyone who wants to work on this?

          Show
          Jan Høydahl added a comment - Anyone who wants to work on this?
          Hide
          Mikhail Khludnev added a comment -

          1. is there a way to dispatch separate queries by the webcontainer threads?
          2. otherwise it requires separate thread pool. It makes operations support more complicated and less predictable. I suppose that webcontainer admin wisely configures number of threads and jvm heap size. Then you surprisingly blows up no:of threads that can lead to failures.

          and even item 1. is possible there is a chance to saturate web container thread pool by multiqueries, which will be blocked by "sub-queries". And saturated thread pool blocks "sub-queries" from progress.

          I propose implement this feature at the client side - in SolrJ. It also allows evenly distribute load on a cluster via http://wiki.apache.org/solr/LBHttpSolrServer underneath, instead of explode single node by such multi-query.

          Show
          Mikhail Khludnev added a comment - 1. is there a way to dispatch separate queries by the webcontainer threads? 2. otherwise it requires separate thread pool. It makes operations support more complicated and less predictable. I suppose that webcontainer admin wisely configures number of threads and jvm heap size. Then you surprisingly blows up no:of threads that can lead to failures. and even item 1. is possible there is a chance to saturate web container thread pool by multiqueries, which will be blocked by "sub-queries". And saturated thread pool blocks "sub-queries" from progress. I propose implement this feature at the client side - in SolrJ. It also allows evenly distribute load on a cluster via http://wiki.apache.org/solr/LBHttpSolrServer underneath, instead of explode single node by such multi-query.
          Hide
          Noble Paul added a comment -

          If you use distributed search, solr uses its own thread pool.

          if you implement it in client side java clients can benefit

          Show
          Noble Paul added a comment - If you use distributed search, solr uses its own thread pool. if you implement it in client side java clients can benefit
          Hide
          LI Geng added a comment -

          What's the current state of this issue? I'm interested in co-working on it.

          Show
          LI Geng added a comment - What's the current state of this issue? I'm interested in co-working on it.
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Karthick Duraisamy Soundararaj added a comment - - edited

          I have created a new class MultiSearchHandler which is an extension of SearchHandler. It takes all the parameters that a SearchHandler can take and parses them into sub queries(LocalSolrQueryRequests). It then executes each of these sub queries serially using the SearchHandler. It doesnt enforce IndexSearcher consistency amongnst multiple queries within the same request (This doesnt harm us and is infact good for our usecase).

          Usage
          To pass a parameter an individual query, it should be prefixed with the query number
          Eg. 1.q=”query1”&2.q=”query2”….

          To pass a parameter to all queries, the prefix shouldn’t be specified
          Eg. count=2&query=”common_query”&1.mm=3&2.mm=2….

          New query parameters specific to MultiSearchHandler
          In addition to all the parameters that a SearchHandler can accept, the following query parameters can be passed to the MultiSearchHandler

          Query parameter that can be used both as common & specific to each individual query
          threshold - The minimum number of matches(numFound) for a query. Default value is -1 .

          Query parameter common to all the sub queries
          count - Count of the queries in the url . This parameter is mandatory
          skiponfailure - Boolean parameter that specifies whether or not to include the results of queries whose numFound is less than threshold. This parameter is optional.
          stoponpass - Boolean parameter that specifies whether or not to stop executing if the query if first subquery has result count greater than the threshold. This parameter is optional.

          Show
          Karthick Duraisamy Soundararaj added a comment - - edited I have created a new class MultiSearchHandler which is an extension of SearchHandler. It takes all the parameters that a SearchHandler can take and parses them into sub queries(LocalSolrQueryRequests). It then executes each of these sub queries serially using the SearchHandler. It doesnt enforce IndexSearcher consistency amongnst multiple queries within the same request (This doesnt harm us and is infact good for our usecase). Usage To pass a parameter an individual query, it should be prefixed with the query number Eg. 1.q=”query1”&2.q=”query2”…. To pass a parameter to all queries, the prefix shouldn’t be specified Eg. count=2&query=”common_query”&1.mm=3&2.mm=2…. New query parameters specific to MultiSearchHandler In addition to all the parameters that a SearchHandler can accept, the following query parameters can be passed to the MultiSearchHandler Query parameter that can be used both as common & specific to each individual query threshold - The minimum number of matches(numFound) for a query. Default value is -1 . Query parameter common to all the sub queries count - Count of the queries in the url . This parameter is mandatory skiponfailure - Boolean parameter that specifies whether or not to include the results of queries whose numFound is less than threshold. This parameter is optional. stoponpass - Boolean parameter that specifies whether or not to stop executing if the query if first subquery has result count greater than the threshold. This parameter is optional.
          Hide
          Hoss Man added a comment -

          I'm removing the fixVersion=4.0 since this feature request doesn't seem like it should hold up the (hopefully eminent) 4.0 release.

          Show
          Hoss Man added a comment - I'm removing the fixVersion=4.0 since this feature request doesn't seem like it should hold up the (hopefully eminent) 4.0 release.
          Hide
          B Karunakar Reddy added a comment - - edited

          Hi every. can anyone tel for which version we have to apply this patch?i am using currently Solr3.4 .

          Show
          B Karunakar Reddy added a comment - - edited Hi every. can anyone tel for which version we have to apply this patch?i am using currently Solr3.4 .
          Hide
          Erick Erickson added a comment -

          well, the patch is from August, so it was probably made against the (then current) 4.x branch. I'd start against a current 4x branch, but wouldn't be surprised if it didn't apply cleanly.

          It'd be cool if you can update it to apply cleanly against the current 4.x trunk...

          Show
          Erick Erickson added a comment - well, the patch is from August, so it was probably made against the (then current) 4.x branch. I'd start against a current 4x branch, but wouldn't be surprised if it didn't apply cleanly. It'd be cool if you can update it to apply cleanly against the current 4.x trunk...
          Hide
          Karthick Duraisamy Soundararaj added a comment -

          B Karunakar Reddy This patch was for the then trunk code. I was able to apply the patch to 4.0 when I tested it last week.

          Show
          Karthick Duraisamy Soundararaj added a comment - B Karunakar Reddy This patch was for the then trunk code. I was able to apply the patch to 4.0 when I tested it last week.
          Hide
          J Mohamed Zahoor added a comment -

          Integration with solrj will be a nice addition

          Show
          J Mohamed Zahoor added a comment - Integration with solrj will be a nice addition
          Hide
          Ariel Lieberman added a comment -

          I've applied it to 4.2 and it works like a charm

          Show
          Ariel Lieberman added a comment - I've applied it to 4.2 and it works like a charm
          Hide
          David Smiley added a comment -

          -1 If this issue is strictly about a feature in which a batch of queries are fully known at the time of submission, then I don't think this should be accepted for inclusion in Solr; sorry. Simply submit them in parallel.

          Instead, I am highly in favor of a scripting request handler in which a script runs that submits the searches to Solr (in-VM) and can react to the results of one request before making another that is formulated dynamically, and can assemble the response data, potentially reducing both the latency and data that would move over the wire if this feature didn't exist. And if you really were bent on submitting a batch of queries that are returned in a batch, then you could implement that with the script.

          Show
          David Smiley added a comment - -1 If this issue is strictly about a feature in which a batch of queries are fully known at the time of submission, then I don't think this should be accepted for inclusion in Solr; sorry. Simply submit them in parallel. Instead, I am highly in favor of a scripting request handler in which a script runs that submits the searches to Solr (in-VM) and can react to the results of one request before making another that is formulated dynamically, and can assemble the response data, potentially reducing both the latency and data that would move over the wire if this feature didn't exist. And if you really were bent on submitting a batch of queries that are returned in a batch, then you could implement that with the script.
          Hide
          Gordon Mohr added a comment -

          I have a possibly-related desired use-case: I'd like to issue M multiple queries, get the top N results for each query, and for each result across all queries, get its score for each of the queries. (And of course I'd prefer to do this in one parallel index pass, rather than M serial passes.)

          Show
          Gordon Mohr added a comment - I have a possibly-related desired use-case: I'd like to issue M multiple queries, get the top N results for each query, and for each result across all queries, get its score for each of the queries. (And of course I'd prefer to do this in one parallel index pass, rather than M serial passes.)
          Hide
          yuanyun.cn added a comment -

          Just found one small issue in the code.
          If we put several fq for one query, only one will be used, for example: 1.fq=datatype:4&1.fq=filetype:pdf.

          To fix this, just need change code below:
          org.apache.solr.handler.component.MultiSearchHandler.initRequestParams(SolrQueryRequest, Vector<SimpleOrderedMap<Object>>, SimpleOrderedMap<Object>)
          + int startPos = paramName.indexOf('.') + 1;
          + localRequestParams.elementAt(queryId - 1).add(
          + paramName.substring(startPos), reqParams.get(paramName));

          Should be changed to:
          int startPos = paramName.indexOf('.') + 1;

          String[] paramsValues = reqParams.getParams(paramName);
          if(paramsValues!=null)
          {
          for(String value: paramsValues)

          { localRequestParams.elementAt(queryId - 1).add( paramName.substring(startPos),value); }
          Show
          yuanyun.cn added a comment - Just found one small issue in the code. If we put several fq for one query, only one will be used, for example: 1.fq=datatype:4&1.fq=filetype:pdf. To fix this, just need change code below: org.apache.solr.handler.component.MultiSearchHandler.initRequestParams(SolrQueryRequest, Vector<SimpleOrderedMap<Object>>, SimpleOrderedMap<Object>) + int startPos = paramName.indexOf('.') + 1; + localRequestParams.elementAt(queryId - 1).add( + paramName.substring(startPos), reqParams.get(paramName)); Should be changed to: int startPos = paramName.indexOf('.') + 1; String[] paramsValues = reqParams.getParams(paramName); if(paramsValues!=null) { for(String value: paramsValues) { localRequestParams.elementAt(queryId - 1).add( paramName.substring(startPos),value); }
          Hide
          Thomas Scheffler added a comment - - edited

          As said by David Smiley in his comment

          I am highly in favor of a scripting request handler in which a script runs that submits the searches to Solr (in-VM) and can react to the results of one request before making another that is formulated dynamically, and can assemble the response data, potentially reducing both the latency and data that would move over the wire if this feature didn't exist.

          I have a use-case that currently requires for every search I do two more searches that depend on the result of the first search. Doing this on client side requires also two more network roundtrips and the overhead of preparing the searches. An efficient way to specify a script (external file or cdata section) could optimize the time for doing this and may even allow better caching.

          Originally I came across this issue to combine the two later request into one. A scriptable RequestHandler would even save this request by moving some simple logic to SOLR.

          Show
          Thomas Scheffler added a comment - - edited As said by David Smiley in his comment I am highly in favor of a scripting request handler in which a script runs that submits the searches to Solr (in-VM) and can react to the results of one request before making another that is formulated dynamically, and can assemble the response data, potentially reducing both the latency and data that would move over the wire if this feature didn't exist. I have a use-case that currently requires for every search I do two more searches that depend on the result of the first search. Doing this on client side requires also two more network roundtrips and the overhead of preparing the searches. An efficient way to specify a script (external file or cdata section) could optimize the time for doing this and may even allow better caching. Originally I came across this issue to combine the two later request into one. A scriptable RequestHandler would even save this request by moving some simple logic to SOLR.
          Hide
          Noble Paul added a comment -

          I'm in favor of a scripting request handler. Why don't we open a separate issue and track of there?

          Show
          Noble Paul added a comment - I'm in favor of a scripting request handler. Why don't we open a separate issue and track of there?
          Hide
          David Smiley added a comment -

          I opened it: SOLR-5005

          Show
          David Smiley added a comment - I opened it: SOLR-5005

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Noble Paul
            • Votes:
              30 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:

                Development