Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: search
    • Labels:
      None

      Description

      Searching over multiple shards and aggregating results.
      Motivated by http://wiki.apache.org/solr/DistributedSearch

      1. solr-dist-faceting-non-ascii-all.patch
        6 kB
        Lars Kotthoff
      2. shards.start_rows.patch
        3 kB
        Brian Whitman
      3. shards_qt.patch
        0.8 kB
        Yonik Seeley
      4. fedsearch.stu.patch
        92 kB
        Stu Hood
      5. fedsearch.stu.patch
        94 kB
        Stu Hood
      6. fedsearch.patch
        57 kB
        Sharad Agarwal
      7. fedsearch.patch
        72 kB
        Sharad Agarwal
      8. fedsearch.patch
        86 kB
        Sharad Agarwal
      9. fedsearch.patch
        162 kB
        Sharad Agarwal
      10. fedsearch.patch
        109 kB
        Sabyasachi Dalal
      11. fedsearch.patch
        135 kB
        Sabyasachi Dalal
      12. fedsearch.patch
        135 kB
        Sabyasachi Dalal
      13. distributed.patch
        122 kB
        Yonik Seeley
      14. distributed.patch
        123 kB
        Yonik Seeley
      15. distributed.patch
        135 kB
        Yonik Seeley
      16. distributed.patch
        83 kB
        Yonik Seeley
      17. distributed.patch
        87 kB
        Yonik Seeley
      18. distributed.patch
        87 kB
        Yonik Seeley
      19. distributed.patch
        98 kB
        Yonik Seeley
      20. distributed.patch
        100 kB
        Yonik Seeley
      21. distributed.patch
        113 kB
        Yonik Seeley
      22. distributed.patch
        113 kB
        Yonik Seeley
      23. distributed.patch
        114 kB
        Yonik Seeley
      24. distributed.patch
        129 kB
        Yonik Seeley
      25. distributed_pjaol.patch
        93 kB
        patrick o'leary
      26. distributed_facet_count_bugfix.patch
        1 kB
        Jayson Minard
      27. distributed_add_tests_for_intended_behavior.patch
        3 kB
        Jayson Minard

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Closing this issue (finally!). Specific bugs or improvements can get their own new issues.
          Thanks to everyone who contributed to this!

          Show
          Yonik Seeley added a comment - Closing this issue (finally!). Specific bugs or improvements can get their own new issues. Thanks to everyone who contributed to this!
          Hide
          Brian Whitman added a comment -

          Lars- I'm using the jetty that comes with solr-trunk, jetty-6.1.3.

          I found this: http://webteam.archive.org/jira/browse/HER-1173#action_14736

          Which indicates the Jetty 6 concordant property is org.mortbay.jetty.Request.maxFormContentSize.

          I set that to 1000000, restarted my shards, and queries of &rows=40000 works. So for those who have this problem, start jetty with:

          java -Dorg.mortbay.jetty.Request.maxFormContentSize=1000000 -jar start.jar

          I would suggest only that the jetty.xml included in the solr example somehow get this parameter hardcoded (I don't know how personally.) I understand this is not a solr issue but it does cause a non-obvious result to an obvious query.

          Show
          Brian Whitman added a comment - Lars- I'm using the jetty that comes with solr-trunk, jetty-6.1.3. I found this: http://webteam.archive.org/jira/browse/HER-1173#action_14736 Which indicates the Jetty 6 concordant property is org.mortbay.jetty.Request.maxFormContentSize. I set that to 1000000, restarted my shards, and queries of &rows=40000 works. So for those who have this problem, start jetty with: java -Dorg.mortbay.jetty.Request.maxFormContentSize=1000000 -jar start.jar I would suggest only that the jetty.xml included in the solr example somehow get this parameter hardcoded (I don't know how personally.) I understand this is not a solr issue but it does cause a non-obvious result to an obvious query.
          Hide
          Lars Kotthoff added a comment -

          Which version of Jetty are you using? The org.mortbay.http.HttpRequest.maxFormContentSize system property seems to be specific to Jetty 5 – I didn't find any information on how to set the limit with Jetty 6 (or indeed if it exists at all).

          Show
          Lars Kotthoff added a comment - Which version of Jetty are you using? The org.mortbay.http.HttpRequest.maxFormContentSize system property seems to be specific to Jetty 5 – I didn't find any information on how to set the limit with Jetty 6 (or indeed if it exists at all).
          Hide
          Brian Whitman added a comment -

          My ids are 32-character MD5s, and the break happens around 23000 rows. The maxFormContentSize doesn't seem to make any difference whether I set it or not-- with it set at 0, -1, 10000000 or not set at all I can query &rows=22300 but not &rows=22400. Obviously this is an edge case but I'm posting this here for the next person who runs into this... but since I can work around it I'll stop messing with it.

          Show
          Brian Whitman added a comment - My ids are 32-character MD5s, and the break happens around 23000 rows. The maxFormContentSize doesn't seem to make any difference whether I set it or not-- with it set at 0, -1, 10000000 or not set at all I can query &rows=22300 but not &rows=22400. Obviously this is an edge case but I'm posting this here for the next person who runs into this... but since I can work around it I'll stop messing with it.
          Hide
          Lars Kotthoff added a comment -

          I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits?

          That's specific to the configuration of your container, I think there's nothing that Solr can do about it.

          As for the form content size, I haven't actually tried that myself I must admit. I'm running Tomcat and just got that parameter from the Jetty documentation. I'd take a wiredump with something like tcpdump to see what the actual size of the request is. Maybe it's even larger than 1000000 bytes?

          Show
          Lars Kotthoff added a comment - I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits? That's specific to the configuration of your container, I think there's nothing that Solr can do about it. As for the form content size, I haven't actually tried that myself I must admit. I'm running Tomcat and just got that parameter from the Jetty documentation. I'd take a wiredump with something like tcpdump to see what the actual size of the request is. Maybe it's even larger than 1000000 bytes?
          Hide
          Yonik Seeley added a comment -

          but I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits?

          That's a jetty limit you hit, the exception was understandable, and an unknown exception like that (from solr's perspective) seems like it should map to a 500 error code.

          Show
          Yonik Seeley added a comment - but I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits? That's a jetty limit you hit, the exception was understandable, and an unknown exception like that (from solr's perspective) seems like it should map to a 500 error code.
          Hide
          Brian Whitman added a comment -

          Yonik, sure-- but I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits?

          Lars – I am having trouble getting that maxFormContentSize property set. I am running jetty like:

          /usr/local/java/bin/java -Dorg.mortbay.http.HttpRequest.maxFormContentSize=1000000 -Xmx7000m -Xms1024m -jar start.jar

          (I've also tried 0 and -1, per the jetty docs this means "unlimited.")

          but the same distributed query gives the same error. How are you setting that property?

          Show
          Brian Whitman added a comment - Yonik, sure-- but I think we should probably handle the case better than a 500 error. maybe a solr warning about per-shard row limits? Lars – I am having trouble getting that maxFormContentSize property set. I am running jetty like: /usr/local/java/bin/java -Dorg.mortbay.http.HttpRequest.maxFormContentSize=1000000 -Xmx7000m -Xms1024m -jar start.jar (I've also tried 0 and -1, per the jetty docs this means "unlimited.") but the same distributed query gives the same error. How are you setting that property?
          Hide
          Yonik Seeley added a comment -

          I'm not sure why Solr is trying to send such large amounts of data to the shards though

          Specifying 40,000 ids to be retrieved I imagine. The average id length must be over 50 bytes.

          Brian: if ordering isn't important for some of these big bulk queries, you might want to consider directly querying the shards.

          Show
          Yonik Seeley added a comment - I'm not sure why Solr is trying to send such large amounts of data to the shards though Specifying 40,000 ids to be retrieved I imagine. The average id length must be over 50 bytes. Brian: if ordering isn't important for some of these big bulk queries, you might want to consider directly querying the shards.
          Hide
          Lars Kotthoff added a comment -

          The default limit for form submissions is 200000 bytes with Jetty. I'm not sure why Solr is trying to send such large amounts of data to the shards though, the only case I've seen this happening is with faceting – Solr has to request facet counts for specific values from the shards to get exact counts. Maybe because of the sorting?

          Anyway, you can change the limit by setting the org.mortbay.http.HttpRequest.maxFormContentSize system property.

          Show
          Lars Kotthoff added a comment - The default limit for form submissions is 200000 bytes with Jetty. I'm not sure why Solr is trying to send such large amounts of data to the shards though, the only case I've seen this happening is with faceting – Solr has to request facet counts for specific values from the shards to get exact counts. Maybe because of the sorting? Anyway, you can change the limit by setting the org.mortbay.http.HttpRequest.maxFormContentSize system property.
          Hide
          Brian Whitman added a comment - - edited

          Getting "Form too large" from jetty while doing normal but large rows= (40000) shards requests. Is this related to SOLR-612 ?

          Query was : http://x.x.x.x/solr/search?q=*:*&sort=indexed%20desc&fl=indexed&rows=40000 , where x.x.x.x is a single shard and /search has the shards ivars mapped to it in solrconfig.

          (Sorry for the mess, but that's how it appears)

          Form_too_large_javalangIllegalStateException
          Form_too_large_at_orgmortbayjettyRequestextractParametersRequestjava1273at
          orgmortbayjettyRequestgetParameterMapRequestjava650_at
          orgapachesolrrequestServletSolrParamsinitServletSolrParamsjava29_at
          orgapachesolrservletStandardRequestParserparseParamsAndFillStreamsSolrRequestParsersjava392_at
          orgapachesolrservletSolrRequestParsersparseSolrRequestParsersjava113_at
          orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava240_at
          orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089_at
          orgmortbayjettyservletServletHandlerhandleServletHandlerjava365_at
          orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216_at
          orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181_at
          orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712_at
          orgmortbayjettywebappWebAppContexthandleWebAppContextjava405_at
          orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211_at
          orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114_at
          orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139_at
          orgmortbayjettyServerhandleServerjava285_at
          orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502_at
          orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835_at
          orgmortbayjettyHttpParserparseNextHttpParserjava641_at
          orgmortbayjettyHttpParserparseAvailableHttpParserjava202_at
          orgmortbayjettyHttpConnectionhandleHttpConnectionjava378_at
          orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226_at
          orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_

          request: http://x.x.x.x.y/solr/select (ed: this was a different shard than the one I called)

          request: http://x.x.x.y/solr/select
          at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
          at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
          at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:371)
          at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:345)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Brian Whitman added a comment - - edited Getting "Form too large" from jetty while doing normal but large rows= (40000) shards requests. Is this related to SOLR-612 ? Query was : http://x.x.x.x/solr/search?q=*:*&sort=indexed%20desc&fl=indexed&rows=40000 , where x.x.x.x is a single shard and /search has the shards ivars mapped to it in solrconfig. (Sorry for the mess, but that's how it appears) Form_too_large_ javalangIllegalStateException Form_too_large_ at_orgmortbayjettyRequestextractParametersRequestjava1273 at orgmortbayjettyRequestgetParameterMapRequestjava650_ at orgapachesolrrequestServletSolrParamsinitServletSolrParamsjava29_ at orgapachesolrservletStandardRequestParserparseParamsAndFillStreamsSolrRequestParsersjava392_ at orgapachesolrservletSolrRequestParsersparseSolrRequestParsersjava113_ at orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava240_ at orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089_ at orgmortbayjettyservletServletHandlerhandleServletHandlerjava365_ at orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216_ at orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181_ at orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712_ at orgmortbayjettywebappWebAppContexthandleWebAppContextjava405_ at orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211_ at orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114_ at orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139_ at orgmortbayjettyServerhandleServerjava285_ at orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502_ at orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835_ at orgmortbayjettyHttpParserparseNextHttpParserjava641_ at orgmortbayjettyHttpParserparseAvailableHttpParserjava202_ at orgmortbayjettyHttpConnectionhandleHttpConnectionjava378_ at orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226_ at orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_ request: http://x.x.x.x.y/solr/select (ed: this was a different shard than the one I called) request: http://x.x.x.y/solr/select at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:371) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:345) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619)
          Hide
          Sean Timm added a comment -

          Another option is to pass state on the number of documents and positions retrieved from each shard. I have a client layer that can do that, so it works, but it is complicated, maintaining state is messy, and the vast majority of requests are first page requests so in practice we almost never use that feature, but instead do exactly as is implemented here and request the full document count from each shard.

          Show
          Sean Timm added a comment - Another option is to pass state on the number of documents and positions retrieved from each shard. I have a client layer that can do that, so it works, but it is complicated, maintaining state is messy, and the vast majority of requests are first page requests so in practice we almost never use that feature, but instead do exactly as is implemented here and request the full document count from each shard.
          Hide
          Brian Whitman added a comment -

          Attaching patch to add a &shards.start and &shards.rows optional parameter. If set, they override distributed search's intelligence on setting start and rows per shard. If you set &shards.start=10 and &shards.rows=10, each shard will be queried with &start=10 and &rows=10 and you'll get back N*10 results (set &rows on the main query to get it all.)

          [Not a java developer, my patch works but may violate good taste/style]

          Show
          Brian Whitman added a comment - Attaching patch to add a &shards.start and &shards.rows optional parameter. If set, they override distributed search's intelligence on setting start and rows per shard. If you set &shards.start=10 and &shards.rows=10, each shard will be queried with &start=10 and &rows=10 and you'll get back N*10 results (set &rows on the main query to get it all.) [Not a java developer, my patch works but may violate good taste/style]
          Hide
          Brian Whitman added a comment -

          Understood. Can I suggest a third alternative?

          two new params: named &d.rows and &d.start with the implication that these get sent unchanged to each of the shards. You may get back up to N*d.rows, where N is the # of shards. That leaves the paging management up to the client.

          Our use case is millions of documents across many shards, and we often do queries that are "get all document of type X." There may be 5m type X documents. Doing a &rows=5000000 is unpredictable so we've previously done a loop of incrementing start by a 1000 and getting 1000 rows each time. But with this distributed setup, each successive batch query takes slightly longer, and by the time we've gotten to the 5,001,000 batch queries are timing out and breaking anyway.

          Show
          Brian Whitman added a comment - Understood. Can I suggest a third alternative? two new params: named &d.rows and &d.start with the implication that these get sent unchanged to each of the shards. You may get back up to N*d.rows, where N is the # of shards. That leaves the paging management up to the client. Our use case is millions of documents across many shards, and we often do queries that are "get all document of type X." There may be 5m type X documents. Doing a &rows=5000000 is unpredictable so we've previously done a loop of incrementing start by a 1000 and getting 1000 rows each time. But with this distributed setup, each successive batch query takes slightly longer, and by the time we've gotten to the 5,001,000 batch queries are timing out and breaking anyway.
          Hide
          Yonik Seeley added a comment -

          http://localhost:8983/solr/select?shards=[4 shards]&q=:&start=5000&rows=1000
          Seems to request &rows=6000 from all the shards?

          It's a feature.

          To retrieve documents 5000-6000, one must find the first 6000 documents then take the last 1000.
          Since it's possible that all top 6000 documents could come from a single shard, the top 6000 documents must be collected from each and merged.

          There are alternatives:
          1) Optimistically request less than 6000 documents per shard and re-query if we are wrong
          2) Add an optional mode that treats documents across shards in the same position as equal, so if you had 10 shards, you would simply get the top 100 docs starting at 500. This might be OK for some applications.

          In general, search engines are optimized at retrieving the top 10 of something, and bad at retrieving the top 10 starting at a big number. Limit the depth people can page, or restructure queries to avoid the latter case.

          Show
          Yonik Seeley added a comment - http://localhost:8983/solr/select?shards=[4 shards]&q= : &start=5000&rows=1000 Seems to request &rows=6000 from all the shards? It's a feature. To retrieve documents 5000-6000, one must find the first 6000 documents then take the last 1000. Since it's possible that all top 6000 documents could come from a single shard, the top 6000 documents must be collected from each and merged. There are alternatives: 1) Optimistically request less than 6000 documents per shard and re-query if we are wrong 2) Add an optional mode that treats documents across shards in the same position as equal, so if you had 10 shards, you would simply get the top 100 docs starting at 500. This might be OK for some applications. In general, search engines are optimized at retrieving the top 10 of something, and bad at retrieving the top 10 starting at a big number. Limit the depth people can page, or restructure queries to avoid the latter case.
          Hide
          Brian Whitman added a comment -

          Anyone notice something like this:

          http://localhost:8983/solr/select?shards=

          {4 shards}

          &q=:&start=5000&rows=1000

          Seems to request &rows=6000 from all the shards? (likewise, start=10000&rows=1000 sends rows=11000 to all the shards?)

          The shards all say:
          INFO: webapp=/solr path=/select params=

          {fl=id,score&start=0&q=*:*&isShard=true&wt=javabin&fsv=true&rows=6000&version=2.2}

          hits=6000 status=0 QTime=175

          And the host I called select on says:
          INFO: webapp=/solr path=/search params=

          {start=5000&q=*:*&rows=1000}

          status=0 QTime=1192

          And the QTime goes up the higher &start goes. (QTime for start=5000 was 200, QTime for start=50000 was 4500, start=500000 had 35000!)

          Bug or feature?

          Show
          Brian Whitman added a comment - Anyone notice something like this: http://localhost:8983/solr/select?shards= {4 shards} &q= : &start=5000&rows=1000 Seems to request &rows=6000 from all the shards? (likewise, start=10000&rows=1000 sends rows=11000 to all the shards?) The shards all say: INFO: webapp=/solr path=/select params= {fl=id,score&start=0&q=*:*&isShard=true&wt=javabin&fsv=true&rows=6000&version=2.2} hits=6000 status=0 QTime=175 And the host I called select on says: INFO: webapp=/solr path=/search params= {start=5000&q=*:*&rows=1000} status=0 QTime=1192 And the QTime goes up the higher &start goes. (QTime for start=5000 was 200, QTime for start=50000 was 4500, start=500000 had 35000!) Bug or feature?
          Hide
          Yonik Seeley added a comment -

          Fixed "debugQuery on a query with shards that returns 0 results will NPE".
          There are still some issues with debugQuery=true, but it's not critical since it is just debugging. I'll open another issue for that.

          Show
          Yonik Seeley added a comment - Fixed "debugQuery on a query with shards that returns 0 results will NPE". There are still some issues with debugQuery=true, but it's not critical since it is just debugging. I'll open another issue for that.
          Hide
          Brian Whitman added a comment - - edited

          Putting &debugQuery on a query with shards that returns 0 results will NPE:

          (removing NPE code block so it stops wrapping the page)

          Show
          Brian Whitman added a comment - - edited Putting &debugQuery on a query with shards that returns 0 results will NPE: (removing NPE code block so it stops wrapping the page)
          Hide
          Brian Whitman added a comment -

          If the user is going to be splitting their index over N shards, it's going to be crucial to have the distributed search (optionally) return the docid->shard map in the response. Is that tricky to add as part of this issue?

          Show
          Brian Whitman added a comment - If the user is going to be splitting their index over N shards, it's going to be crucial to have the distributed search (optionally) return the docid->shard map in the response. Is that tricky to add as part of this issue?
          Hide
          Lars Kotthoff added a comment -

          Making this issue depend on SOLR-443 as distributed faceting of non-ascii values won't work properly without it. Please also see my comment on that issue.

          Show
          Lars Kotthoff added a comment - Making this issue depend on SOLR-443 as distributed faceting of non-ascii values won't work properly without it. Please also see my comment on that issue.
          Show
          Yonik Seeley added a comment - I forgot we've already gone a few rounds on charset in POST bodies: https://issues.apache.org/jira/browse/SOLR-443 http://markmail.org/message/gtzbtwzqa6zranur?q=POST+body+charset#query:POST%20body%20charset+page:1+mid:fkragfatbox5fff5+state:results
          Hide
          Lars Kotthoff added a comment -

          Yonik, thanks for taking a look at it.

          I've investigated this issue further and I believe I know what the root cause is now. The line

          o.a.s.client.solrj.impl.CommonsHttpSolrServer.java
          ...
          post.getParams().setContentCharset("UTF-8");
          ...
          

          tells the sender to encode the data as UTF-8. The way the receiver decodes the data depends on whatever is set as charset in the Content-Type header. This header is currently automatically added by httpclient and, as you can see in the netcat log, "application/x-www-form-urlencoded", i.e. without a charset. The default charset is ISO-8859-1 (cf. http://hc.apache.org/httpclient-3.x/charencodings.html). So the data is encoded as UTF-8 but decoded as ISO-8859-1, which causes the effect I described earlier.

          I tried to reproduce this with TestDistributedSearch myself, but for some reason it seems to be fine. Perhaps the Jetty configuration is different to my Tomcat configuration. I didn't find any parameter to tell Tomcat the default encoding if the Content-Type header doesn't specify one though.

          The minimal change I had to make to make it work was add a line to set the Content-Type header explicitly, i.e.

          o.a.s.client.solrj.impl.CommonsHttpSolrServer.java
          ...
          post.getParams().setContentCharset("UTF-8");
          post.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
          ...
          

          This probably won't work with multi-part requests though. I'm not sure what the right way to handle this would be. The stub Content-Type header is set by httpclient when the method is executed, i.e. there's no way to let httpclient figure out the first part and then append the charset in CommonsHttpSolrServer.

          Some other things I've noticed:

          Show
          Lars Kotthoff added a comment - Yonik, thanks for taking a look at it. I've investigated this issue further and I believe I know what the root cause is now. The line o.a.s.client.solrj.impl.CommonsHttpSolrServer.java ... post.getParams().setContentCharset( "UTF-8" ); ... tells the sender to encode the data as UTF-8. The way the receiver decodes the data depends on whatever is set as charset in the Content-Type header. This header is currently automatically added by httpclient and, as you can see in the netcat log, "application/x-www-form-urlencoded", i.e. without a charset. The default charset is ISO-8859-1 (cf. http://hc.apache.org/httpclient-3.x/charencodings.html ). So the data is encoded as UTF-8 but decoded as ISO-8859-1, which causes the effect I described earlier. I tried to reproduce this with TestDistributedSearch myself, but for some reason it seems to be fine. Perhaps the Jetty configuration is different to my Tomcat configuration. I didn't find any parameter to tell Tomcat the default encoding if the Content-Type header doesn't specify one though. The minimal change I had to make to make it work was add a line to set the Content-Type header explicitly, i.e. o.a.s.client.solrj.impl.CommonsHttpSolrServer.java ... post.getParams().setContentCharset( "UTF-8" ); post.setRequestHeader( "Content-Type" , "application/x-www-form-urlencoded; charset=UTF-8" ); ... This probably won't work with multi-part requests though. I'm not sure what the right way to handle this would be. The stub Content-Type header is set by httpclient when the method is executed, i.e. there's no way to let httpclient figure out the first part and then append the charset in CommonsHttpSolrServer. Some other things I've noticed: Just before the content charset is set, the parameters of the POST request are populated. If the value for a parameter is null, the code attempts to to add a null parameter. This however will cause an IllegalArgumentException from httpclient (cf. http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String, java.lang.String) ). TestDistributedSearch does not exercise the code to refine facet counts. Adding another facet request with facet.limit=1 redresses this.
          Hide
          Yonik Seeley added a comment -

          Lars: I'm not yet able to reproduce an issue with SolrJ not encoding the parameters properly.

          The following code finds the sample solr document:

              SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
              ModifiableSolrParams params = new ModifiableSolrParams();
              params.set("echoParams","all");
              params.set("q","+h\u00E9llo");
              QueryRequest req = new QueryRequest(params);
              req.setMethod(SolrRequest.METHOD.POST);
               System.out.println(server.request(req));
          

          And netcat confirms the encoding looks good, and is in fact using POST

          $ nc -l -p 8983
          POST /solr/select HTTP/1.1
          User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
          Host: localhost:8983
          Content-Length: 53
          Content-Type: application/x-www-form-urlencoded
          
          echoParams=all&q=%2Bh%C3%A9llo&wt=javabin&version=2.2
          

          I'll see if I can reproduce anything with TestDistributedSearch

          Show
          Yonik Seeley added a comment - Lars: I'm not yet able to reproduce an issue with SolrJ not encoding the parameters properly. The following code finds the sample solr document: SolrServer server = new CommonsHttpSolrServer( "http: //localhost:8983/solr" ); ModifiableSolrParams params = new ModifiableSolrParams(); params.set( "echoParams" , "all" ); params.set( "q" , "+h\u00E9llo" ); QueryRequest req = new QueryRequest(params); req.setMethod(SolrRequest.METHOD.POST); System .out.println(server.request(req)); And netcat confirms the encoding looks good, and is in fact using POST $ nc -l -p 8983 POST /solr/select HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8983 Content-Length: 53 Content-Type: application/x-www-form-urlencoded echoParams=all&q=%2Bh%C3%A9llo&wt=javabin&version=2.2 I'll see if I can reproduce anything with TestDistributedSearch
          Hide
          Yonik Seeley added a comment -

          Lars: I committed your fix to the facet.limit value sent to shards, and instead of changing ntop when facet.limit<=0, I simply short-circuited checking if refinement is needed at all.

          Next up: investigate this URL encoding (or lack of it) in the POST body.

          Show
          Yonik Seeley added a comment - Lars: I committed your fix to the facet.limit value sent to shards, and instead of changing ntop when facet.limit<=0, I simply short-circuited checking if refinement is needed at all. Next up: investigate this URL encoding (or lack of it) in the POST body.
          Hide
          Sean Timm added a comment -

          In SOLR-502, there is the notion of partialResults. It seems that the same flag could be used in this case. Perhaps a string should also be added indicating why all results were not able to be returned.

          Show
          Sean Timm added a comment - In SOLR-502 , there is the notion of partialResults. It seems that the same flag could be used in this case. Perhaps a string should also be added indicating why all results were not able to be returned.
          Hide
          Yonik Seeley added a comment -

          > But shouldn't there be an option to skip over servers that aren't responding or time out?

          That does sound like it would be a useful option (but I think it should be false by default though).

          FYI, I'm currently looking into Lars' facet changes.

          Show
          Yonik Seeley added a comment - > But shouldn't there be an option to skip over servers that aren't responding or time out? That does sound like it would be a useful option (but I think it should be false by default though). FYI, I'm currently looking into Lars' facet changes.
          Hide
          Otis Gospodnetic added a comment -

          Ah, yes, I agree with Brian. I did see this, too, fut forgot to report it as a problem that needs a fix.

          Show
          Otis Gospodnetic added a comment - Ah, yes, I agree with Brian. I did see this, too, fut forgot to report it as a problem that needs a fix.
          Hide
          Brian Whitman added a comment -

          When I give the following request:

          http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solr&q=woof

          With no server running on 8984 I get a error 500 (naturally.)

          But shouldn't there be an option to skip over servers that aren't responding or time out? Envisioning a scenario in which this is used to search across possibly redundant uniqueIDs and a server being down is not cause for exception.

          Show
          Brian Whitman added a comment - When I give the following request: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solr&q=woof With no server running on 8984 I get a error 500 (naturally.) But shouldn't there be an option to skip over servers that aren't responding or time out? Envisioning a scenario in which this is used to search across possibly redundant uniqueIDs and a server being down is not cause for exception.
          Hide
          Hoss Man added a comment -

          marking as intended for 1.3 ... i'm not overly familiar with the state of this issue, but i do know that large chunks of functionality have already been committed, so i want to make sure that before 1.3 is released someone conciously decides between:

          • "DONE" ...resolving this issue
          • "NOT DONE BUT OK" ... leaving the issue unresolved and removing the 1.3 designation
          • "NOT DONE AND NOT OK" ... rolling back any/all committed code that is considered detrimental for the 1.3 release.
          Show
          Hoss Man added a comment - marking as intended for 1.3 ... i'm not overly familiar with the state of this issue, but i do know that large chunks of functionality have already been committed, so i want to make sure that before 1.3 is released someone conciously decides between: "DONE" ...resolving this issue "NOT DONE BUT OK" ... leaving the issue unresolved and removing the 1.3 designation "NOT DONE AND NOT OK" ... rolling back any/all committed code that is considered detrimental for the 1.3 release.
          Hide
          Lars Kotthoff added a comment -

          On closer inspection of the code, are the fields "sort" and "prefix" of FieldFacet used anywhere at all? They don't seem to be referenced anywhere in the code and just removing them doesn't seem to have any obvious effect.

          Show
          Lars Kotthoff added a comment - On closer inspection of the code, are the fields "sort" and "prefix" of FieldFacet used anywhere at all? They don't seem to be referenced anywhere in the code and just removing them doesn't seem to have any obvious effect.
          Hide
          Lars Kotthoff added a comment -

          I've had a couple of issues with the current version. First, the facet queries which are sent to the other shards are posted in the URL, but aren't URL encoded, i.e. during the refine stage anything non-ascii results in facet counts for "new" values (i.e. the garbled version) coming back and causing NPEs when trying to update the counts.

          Furthermore, facet.limit=<negative value> isn't working as expected, i.e. instead of all facets it returns none. Also facet.sort is not automatically enabled for negative values.

          I've attached "solr-dist-faceting-non-ascii-all.patch" which fixes the above issues. Somebody who understands what everything is supposed to do should have a look over it though
          For example I've found two linked hash maps in FacetInfo, topFacets and listFacets, which seem to serve the same purpose. Therefore I replaced them by a single hash map. It seems to work just fine this way.

          Show
          Lars Kotthoff added a comment - I've had a couple of issues with the current version. First, the facet queries which are sent to the other shards are posted in the URL, but aren't URL encoded, i.e. during the refine stage anything non-ascii results in facet counts for "new" values (i.e. the garbled version) coming back and causing NPEs when trying to update the counts. Furthermore, facet.limit=<negative value> isn't working as expected, i.e. instead of all facets it returns none. Also facet.sort is not automatically enabled for negative values. I've attached "solr-dist-faceting-non-ascii-all.patch" which fixes the above issues. Somebody who understands what everything is supposed to do should have a look over it though For example I've found two linked hash maps in FacetInfo, topFacets and listFacets, which seem to serve the same purpose. Therefore I replaced them by a single hash map. It seems to work just fine this way.
          Hide
          Yonik Seeley added a comment -

          I just committed shards_qt.patch

          Show
          Yonik Seeley added a comment - I just committed shards_qt.patch
          Hide
          Yonik Seeley added a comment -

          Attaching shards_qt.patch, which uses "shards.qt" as "qt" for sub-requests to avoid infinite recursion when setting "shards" as a default in the request handler.

          Show
          Yonik Seeley added a comment - Attaching shards_qt.patch, which uses "shards.qt" as "qt" for sub-requests to avoid infinite recursion when setting "shards" as a default in the request handler.
          Hide
          Thomas Peuss added a comment -

          they are limited in the number of documents they can request during the second phase by the maximum length of the query string.

          For Tomcat you can increase the allowed length of the query string by adding for example maxHttpHeaderSize="65536" to the Connector entries in server.xml. This increases the max. allowed GET request size to 64KB (standard is 4KB).

          Show
          Thomas Peuss added a comment - they are limited in the number of documents they can request during the second phase by the maximum length of the query string. For Tomcat you can increase the allowed length of the query string by adding for example maxHttpHeaderSize="65536" to the Connector entries in server.xml. This increases the max. allowed GET request size to 64KB (standard is 4KB).
          Hide
          Stu Hood added a comment -

          Because the subqueries to Solr shards use GET requests (via SolrJ), they are limited in the number of documents they can request during the second phase by the maximum length of the query string.

          One (API preserving) solution would be to modify SolrJ to use a POST request for queries if the query string is longer than some constant value.

          Show
          Stu Hood added a comment - Because the subqueries to Solr shards use GET requests (via SolrJ), they are limited in the number of documents they can request during the second phase by the maximum length of the query string. One (API preserving) solution would be to modify SolrJ to use a POST request for queries if the query string is longer than some constant value.
          Hide
          Jayson Minard added a comment -

          I'll see if I can work up a patch tonight on the extended response...

          Show
          Jayson Minard added a comment - I'll see if I can work up a patch tonight on the extended response...
          Hide
          Sean Timm added a comment -

          Jayson--

          I agree. I've been meaning to recommend that be added. We've found it invaluable in the past (mostly with debugging) when doing federated and distributed search. I would like to see a "shard" field added which would contain the base URI of the shard where the result originated as provided in the request. The index of each result is less important to me, but I can see how that would be useful.

          Show
          Sean Timm added a comment - Jayson-- I agree. I've been meaning to recommend that be added. We've found it invaluable in the past (mostly with debugging) when doing federated and distributed search. I would like to see a "shard" field added which would contain the base URI of the shard where the result originated as provided in the request. The index of each result is less important to me, but I can see how that would be useful.
          Hide
          Jayson Minard added a comment -

          Would it be interesting to others to have an extended response format for distributed queries that would bring back the list of shards numbered, and then code each element of the response with the source list of shards that contributed to the element appearing in the results? For example, which shard was the source of a document? Or which shards had the facet value present? And so on.

          In really high shard counts it is more efficient if you can trim follow-on queries and pivots to only shards that matter. This information would help that effort.

          Regardless, it is useful for debugging.

          Show
          Jayson Minard added a comment - Would it be interesting to others to have an extended response format for distributed queries that would bring back the list of shards numbered, and then code each element of the response with the source list of shards that contributed to the element appearing in the results? For example, which shard was the source of a document? Or which shards had the facet value present? And so on. In really high shard counts it is more efficient if you can trim follow-on queries and pivots to only shards that matter. This information would help that effort. Regardless, it is useful for debugging.
          Hide
          Yonik Seeley added a comment -

          committed addition tests... thanks!

          Show
          Yonik Seeley added a comment - committed addition tests... thanks!
          Hide
          Jayson Minard added a comment -

          A few more tests to show intended behavior when facets differ between shards which is likely in the wild (missing from all but valid in schema, missing from some, and invalid field not in schema). The last test is just to ensure error behavior matches non-distributed searches.

          Show
          Jayson Minard added a comment - A few more tests to show intended behavior when facets differ between shards which is likely in the wild (missing from all but valid in schema, missing from some, and invalid field not in schema). The last test is just to ensure error behavior matches non-distributed searches.
          Hide
          Yonik Seeley added a comment -

          I just committed this bugfix... thanks Jayson!

          Show
          Yonik Seeley added a comment - I just committed this bugfix... thanks Jayson!
          Hide
          Jayson Minard added a comment -

          Attached patch to fix issue with distributed search. If you specified a facet.field that was valid for the schema but not contained in a shard, an unintentional exception (array index out of bounds) would be thrown instead of returning the facet as empty.

          Show
          Jayson Minard added a comment - Attached patch to fix issue with distributed search. If you specified a facet.field that was valid for the schema but not contained in a shard, an unintentional exception (array index out of bounds) would be thrown instead of returning the facet as empty.
          Hide
          Henri Biestro added a comment -

          Nothing functional , just noticed reading the code that Shard

          {Doc,Request}

          are missing the Apache license header.

          Show
          Henri Biestro added a comment - Nothing functional , just noticed reading the code that Shard {Doc,Request} are missing the Apache license header.
          Hide
          Yonik Seeley added a comment -

          OK, I've committed this! Thanks everyone!
          I'll leave this bug open for now as a place to accumulate patches.

          Some things that are missing (but optional and not currently high on my TODO list):

          • field faceting when facet.sort=false
          • distributed idf... this has a performance cost, and should matter little in a well mixed index.
          Show
          Yonik Seeley added a comment - OK, I've committed this! Thanks everyone! I'll leave this bug open for now as a place to accumulate patches. Some things that are missing (but optional and not currently high on my TODO list): field faceting when facet.sort=false distributed idf... this has a performance cost, and should matter little in a well mixed index.
          Hide
          Ryan McKinley added a comment -

          Given that most of this is new functionality, I think things are in good enough shape to commit now (making it much easier for others to generate patches against it).

          +1 (But have only checked that it does not break anything I'm working with) – I think this should get committed soon. Since it is large and mostly discrete from existing functions, it will be much easier to refine with smaller patches.

          Show
          Ryan McKinley added a comment - Given that most of this is new functionality, I think things are in good enough shape to commit now (making it much easier for others to generate patches against it). +1 (But have only checked that it does not break anything I'm working with) – I think this should get committed soon. Since it is large and mostly discrete from existing functions, it will be much easier to refine with smaller patches.
          Hide
          Yonik Seeley added a comment -

          New patch:

          • test framework using multiple embedded jetty servers that adds documents to multiple servers, and also to a control server, then executes both distributed and non-distributed queries and compares the results.
          • fixed merging for non-string uniqueKeyFields
          • fixed issue when id field was not selected by client
          • break facet count ties by label
          • added rudimentary duplicate detection in case one accidentally adds the same doc to different shards
          • add code to handle index changes between query phases (docs may no longer exist)

          Given that most of this is new functionality, I think things are in good enough shape to commit now (making it much easier for others to generate patches against it).

          Show
          Yonik Seeley added a comment - New patch: test framework using multiple embedded jetty servers that adds documents to multiple servers, and also to a control server, then executes both distributed and non-distributed queries and compares the results. fixed merging for non-string uniqueKeyFields fixed issue when id field was not selected by client break facet count ties by label added rudimentary duplicate detection in case one accidentally adds the same doc to different shards add code to handle index changes between query phases (docs may no longer exist) Given that most of this is new functionality, I think things are in good enough shape to commit now (making it much easier for others to generate patches against it).
          Hide
          Yonik Seeley added a comment -

          Patrick, I've reproduced your null pointer exception on accidental duplicates (I've been working on tests). I'll look into a fix along the lines of what you suggested.

          Show
          Yonik Seeley added a comment - Patrick, I've reproduced your null pointer exception on accidental duplicates (I've been working on tests). I'll look into a fix along the lines of what you suggested.
          Hide
          Yonik Seeley added a comment -

          fixed test cases that relied on parsing previous explain format

          Show
          Yonik Seeley added a comment - fixed test cases that relied on parsing previous explain format
          Hide
          Yonik Seeley added a comment -

          New patch attached... last one had an unfinished change that prevented compilation (using the generic SolrResponse instead of SolrQueryResponse).

          Show
          Yonik Seeley added a comment - New patch attached... last one had an unfinished change that prevented compilation (using the generic SolrResponse instead of SolrQueryResponse).
          Hide
          Yonik Seeley added a comment -

          > I really need the ShardDoc's classes to be split up into public classes

          ShardDoc is public already... can you elaborate?

          > It would also be fantastic to open up QueryComponent, my component only needs to override a few functions

          What is yours trying to accomplish?

          > A solution would be to maintain map of unique fields as adding the ShardDocs to the priority queue, and continue on duplicates.

          Agree. It should fall into the category of robustness though, rather than a duplicates detection feature (since it will mean that facets will be off, and it will be possible to get fewer docs than requested if duplicates do exist).

          We also need to be robust in the face of a commit on a shard happening between phases of a request (a doc that we request info for may no longer exist, etc). That would probably cause us to blow up currently.

          Hopefully this can be committed after some basic tests are added, and that will make it much easier for others to contribute patches. In the future maybe we should try a branch for changes this large.

          Show
          Yonik Seeley added a comment - > I really need the ShardDoc's classes to be split up into public classes ShardDoc is public already... can you elaborate? > It would also be fantastic to open up QueryComponent, my component only needs to override a few functions What is yours trying to accomplish? > A solution would be to maintain map of unique fields as adding the ShardDocs to the priority queue, and continue on duplicates. Agree. It should fall into the category of robustness though, rather than a duplicates detection feature (since it will mean that facets will be off, and it will be possible to get fewer docs than requested if duplicates do exist). We also need to be robust in the face of a commit on a shard happening between phases of a request (a doc that we request info for may no longer exist, etc). That would probably cause us to blow up currently. Hopefully this can be committed after some basic tests are added, and that will make it much easier for others to contribute patches. In the future maybe we should try a branch for changes this large.
          Hide
          Yonik Seeley added a comment -

          updated patch:

          • refactored some distributed search code to make things easier (added modifyRequest, etc)
          • added merging of debugging info timing info (including timing info, via generic recursive merging)
          • merge explain info, drops internal id from explain key for easier merging
          • Many small changes: don't return scores if they aren't requested (even if needed for shard requests to merge), return maxScore
            if scores are requested, enable escaping for shards parameter.
          Show
          Yonik Seeley added a comment - updated patch: refactored some distributed search code to make things easier (added modifyRequest, etc) added merging of debugging info timing info (including timing info, via generic recursive merging) merge explain info, drops internal id from explain key for easier merging Many small changes: don't return scores if they aren't requested (even if needed for shard requests to merge), return maxScore if scores are requested, enable escaping for shards parameter.
          Hide
          patrick o'leary added a comment -

          It looks pretty good, I really need the ShardDoc's classes to be split up into public classes so I can use
          them.
          It would also be fantastic to open up QueryComponent, my component only needs to over ride
          a few functions, and it would so much cleaner to just extend QueryComponent rather than duplicate the code.

          Also through testing, it might be worth while to apply a few negative edge cases.
          e.g. duplicate documents in different shards. As systems get larger this is a huge possibility. Only fixed hash indexing could ensure you don't get duplicates, but if you try to have an extend-able environment that might not be an option.

          Took me a while to realize I had duplicated documents during indexing, but it causes NPEs in the query response writers, so not obvious or easy to figure out.

          A solution would be to maintain map of unique fields as adding the ShardDocs to the priority queue, and continue on duplicates. You might also want to put some logic in there to ensure same shard doc is used for each duplicate doc, simple because the scores for identical doc's will be different across shards, and could change based upon order of which Shard responds first. This should eliminate that

          So something like
          QueryComponent.mergeIds

          
          Map<Object, String> uniqueDoc = new HashMap<Object, String>();
                
                for (ShardResponse srsp : sreq.responses) {
                  SolrDocumentList docs = srsp.rsp.getResults();
                   ................
                   ................
                   // go through every doc in this response, construct a ShardDoc, and
                  // put it in the priority queue so it can be ordered.
                  for (int i=0; i<docs.size(); i++) {
                    SolrDocument doc = docs.get(i);
                    ..................
                    ..................
                    Object uniqueField = doc.getFieldValue(uniqueKeyField.getName());
                    
                    if(! uniqueDoc.containsKey(uniqueField)) {
                  	  shardDoc.setId(uniqueField);
                  	  uniqueDoc.put(uniqueField, shardDoc.shard);
                    } else{
                  	  numFound--;
                  	  if(uniqueDoc.get(uniqueField).compareTo(shardDoc.shard) >0){
                  		 continue;
                  	  }
                    }
          
                    ..........................
                    queue.insert(shardDoc);
                  } // end for-each-doc-in-response
                } // end for-each-response
          
          Show
          patrick o'leary added a comment - It looks pretty good, I really need the ShardDoc's classes to be split up into public classes so I can use them. It would also be fantastic to open up QueryComponent, my component only needs to over ride a few functions, and it would so much cleaner to just extend QueryComponent rather than duplicate the code. Also through testing, it might be worth while to apply a few negative edge cases. e.g. duplicate documents in different shards. As systems get larger this is a huge possibility. Only fixed hash indexing could ensure you don't get duplicates, but if you try to have an extend-able environment that might not be an option. Took me a while to realize I had duplicated documents during indexing, but it causes NPEs in the query response writers, so not obvious or easy to figure out. A solution would be to maintain map of unique fields as adding the ShardDocs to the priority queue, and continue on duplicates. You might also want to put some logic in there to ensure same shard doc is used for each duplicate doc, simple because the scores for identical doc's will be different across shards, and could change based upon order of which Shard responds first. This should eliminate that So something like QueryComponent.mergeIds Map< Object , String > uniqueDoc = new HashMap< Object , String >(); for (ShardResponse srsp : sreq.responses) { SolrDocumentList docs = srsp.rsp.getResults(); ................ ................ // go through every doc in this response, construct a ShardDoc, and // put it in the priority queue so it can be ordered. for ( int i=0; i<docs.size(); i++) { SolrDocument doc = docs.get(i); .................. .................. Object uniqueField = doc.getFieldValue(uniqueKeyField.getName()); if (! uniqueDoc.containsKey(uniqueField)) { shardDoc.setId(uniqueField); uniqueDoc.put(uniqueField, shardDoc.shard); } else { numFound--; if (uniqueDoc.get(uniqueField).compareTo(shardDoc.shard) >0){ continue ; } } .......................... queue.insert(shardDoc); } // end for -each-doc-in-response } // end for -each-response
          Hide
          Yonik Seeley added a comment -

          Updated patch:

          • face refinement requests piggyback on the requests to retrieve stored fields where possible.
          • fixed bug when requesting scores... don't include scores even if requested if they are not in the given DocList
          • fixed HTTP error codes for query parse errirs
          • added double/long support in sorting since we've upgraded to lucene 2.3, and changed aggregate numFound to handle long
          • escape&unescape comma separated "ids" string using backslash escaping (used to specify docs from each shard to retrieve)
          • other misc cleanups
          Show
          Yonik Seeley added a comment - Updated patch: face refinement requests piggyback on the requests to retrieve stored fields where possible. fixed bug when requesting scores... don't include scores even if requested if they are not in the given DocList fixed HTTP error codes for query parse errirs added double/long support in sorting since we've upgraded to lucene 2.3, and changed aggregate numFound to handle long escape&unescape comma separated "ids" string using backslash escaping (used to specify docs from each shard to retrieve) other misc cleanups
          Hide
          Yonik Seeley added a comment -

          This update adds parallel requests.

          • a singleton communications thread pool (executor) is added... currently static, but it should be per core and have a way of shutting down.
          • a singleton HttpClient for use by all SolrServer instances, currently static, probably fine to remain so (unless there needs to be core specific config?)
          • an exception causes everything to be aborted
          • all requests in a phase are sent out in parallel
          • a completion service is used for grabbing completed requests, so the first requests back can start being processed.
          • while receiving responses, if any new requests are put on the outgoing queue, they are immediately sent out before waiting for any further responses.
          Show
          Yonik Seeley added a comment - This update adds parallel requests. a singleton communications thread pool (executor) is added... currently static, but it should be per core and have a way of shutting down. a singleton HttpClient for use by all SolrServer instances, currently static, probably fine to remain so (unless there needs to be core specific config?) an exception causes everything to be aborted all requests in a phase are sent out in parallel a completion service is used for grabbing completed requests, so the first requests back can start being processed. while receiving responses, if any new requests are put on the outgoing queue, they are immediately sent out before waiting for any further responses.
          Hide
          patrick o'leary added a comment -

          Hey Yonik
          Needed to make a couple of updates to ShardDoc as the nested outer classes were preventing me from using the patch.
          Also included SOLR-457, with a multi threaded implementation of solrj to query the shards.
          with this patch.

          P

          Show
          patrick o'leary added a comment - Hey Yonik Needed to make a couple of updates to ShardDoc as the nested outer classes were preventing me from using the patch. Also included SOLR-457 , with a multi threaded implementation of solrj to query the shards. with this patch. P
          Hide
          Dima Brodsky added a comment -

          Hey,

          Quick question from a solr newbie. I'd love to be able to play/test out the distributed functionality of this patch. Are there some user level instructions as to how to configure and run?

          Thanks!
          ttyl
          Dima

          Show
          Dima Brodsky added a comment - Hey, Quick question from a solr newbie. I'd love to be able to play/test out the distributed functionality of this patch. Are there some user level instructions as to how to configure and run? Thanks! ttyl Dima
          Hide
          Yonik Seeley added a comment -

          Now patch attached... this one implements count tiebreaking by index order (to match the non-distributed faceting).

          Show
          Yonik Seeley added a comment - Now patch attached... this one implements count tiebreaking by index order (to match the non-distributed faceting).
          Hide
          Yonik Seeley added a comment -

          New patch attached...

          I just discovered that refinement queries weren't working because filter.query doesn't accept the new query syntax I was using to avoid having to escape field values: <!field f=myfield>value
          (this should probably be committed separately, but it's in this patch for now).

          I put in code to over-request facet.field limit, but then commented it out for now since it too easily covers up bugs because it often prevents any refinement query logic from being exercized.

          Also corrected the code that always used the last element as the max possible missing count. If we requested 10 terms and only got 6, then we know that the max possible missing count is zero.

          Show
          Yonik Seeley added a comment - New patch attached... I just discovered that refinement queries weren't working because filter.query doesn't accept the new query syntax I was using to avoid having to escape field values: <!field f=myfield>value (this should probably be committed separately, but it's in this patch for now). I put in code to over-request facet.field limit, but then commented it out for now since it too easily covers up bugs because it often prevents any refinement query logic from being exercized. Also corrected the code that always used the last element as the max possible missing count. If we requested 10 terms and only got 6, then we know that the max possible missing count is zero.
          Hide
          Yonik Seeley added a comment -

          > I would think it would be n * number of shards.

          That would make the number of terms to transfer over the network and to merge O(n_shards**2)... not great for scalability

          Show
          Yonik Seeley added a comment - > I would think it would be n * number of shards. That would make the number of terms to transfer over the network and to merge O(n_shards**2)... not great for scalability
          Hide
          Yonik Seeley added a comment -

          > one solution i've seen to mitigate problems like this in the past is to compute a higher "limit" when querying the individual shards

          Yep. Eventually should be configurable too. We should definitely do some "over requesting" for very small limits. Expanding the limit too much can be expensive though (CPU cost partially depends on the algorithm). I think users should even be able to disable refinement queries if they just want an estimate.

          Note that it's possible to tell if there even could be stealth terms out there... we maintain the smallest count we get from each shard, so that serves as the largest count any unknown term could have. Add all those together to see if it's possible an unknown term could make it to the top terms. This means you could do a request with a smaller limit, and then re-request with a larger limit if necessary.

          Beyond that, it becomes unclear what the best strategy is. Worst case scenario: If the top N facets get down to a count of 1, then any unknown term could bump another higher. Requesting all terms with count>=1 from each shard isn't something I want to ponder.

          Anyway, a colleague informs me that this is the way at least one other major search vendor does things (counts are exact for terms shown, but it is theoretically possible to miss a term).

          Show
          Yonik Seeley added a comment - > one solution i've seen to mitigate problems like this in the past is to compute a higher "limit" when querying the individual shards Yep. Eventually should be configurable too. We should definitely do some "over requesting" for very small limits. Expanding the limit too much can be expensive though (CPU cost partially depends on the algorithm). I think users should even be able to disable refinement queries if they just want an estimate. Note that it's possible to tell if there even could be stealth terms out there... we maintain the smallest count we get from each shard, so that serves as the largest count any unknown term could have. Add all those together to see if it's possible an unknown term could make it to the top terms. This means you could do a request with a smaller limit, and then re-request with a larger limit if necessary. Beyond that, it becomes unclear what the best strategy is. Worst case scenario: If the top N facets get down to a count of 1, then any unknown term could bump another higher. Requesting all terms with count>=1 from each shard isn't something I want to ponder. Anyway, a colleague informs me that this is the way at least one other major search vendor does things (counts are exact for terms shown, but it is theoretically possible to miss a term).
          Hide
          Ian Holsman added a comment -

          Hoss..

          I'm not sure about n**2.

          I would think it would be n * number of shards.

          Show
          Ian Holsman added a comment - Hoss.. I'm not sure about n**2. I would think it would be n * number of shards.
          Hide
          Hoss Man added a comment -

          OK, this version patches cleanly and includes some distributed faceting code.

          I haven't looked at it ... but holy freaking cow that's cool.

          Note that it is theoretically possible to miss terms. A term could be just below the threshold of each shard (and thus not returned by any shard), but the total count could boost it in the top. This could be rectified by retrieving all terms above a specified count, but it could be expensive. The counts that are currently returned are exact.

          one solution i've seen to mitigate problems like this in the past is to compute a higher "limit" when querying the individual shards, someone somewhere suggested that n**2 is a good approach (but they may have been talking out of their ass) so if the initial request says facet.limit=5, the individual shards would be queried with facet.limit=25 ... but you'd also still want to use refinement requests.

          Show
          Hoss Man added a comment - OK, this version patches cleanly and includes some distributed faceting code. I haven't looked at it ... but holy freaking cow that's cool. Note that it is theoretically possible to miss terms. A term could be just below the threshold of each shard (and thus not returned by any shard), but the total count could boost it in the top. This could be rectified by retrieving all terms above a specified count, but it could be expensive. The counts that are currently returned are exact. one solution i've seen to mitigate problems like this in the past is to compute a higher "limit" when querying the individual shards, someone somewhere suggested that n**2 is a good approach (but they may have been talking out of their ass) so if the initial request says facet.limit=5, the individual shards would be queried with facet.limit=25 ... but you'd also still want to use refinement requests.
          Hide
          Yonik Seeley added a comment -

          Note that for a normal facet query, this could result in 3 waves of requests.
          1) query + facet
          2) facet refinements
          3) retrieve stored fields + highlight

          We probably want to allow #2 to piggyback on #3 requests, provided that nothing needs final facet values before retrieving the stored fields.

          Show
          Yonik Seeley added a comment - Note that for a normal facet query, this could result in 3 waves of requests. 1) query + facet 2) facet refinements 3) retrieve stored fields + highlight We probably want to allow #2 to piggyback on #3 requests, provided that nothing needs final facet values before retrieving the stored fields.
          Hide
          Yonik Seeley added a comment -

          OK, this version patches cleanly and includes some distributed faceting code.

          • facet.query and facet.field sorted by count is mostly handled
          • breaking ties by natural (index) sort order is not yet implemented
          • date faceting and unsorted (index order) facet.field is not implemented

          Assuming the user asks for the top 10 terms of a field:
          1) The first facet queries piggyback on the queries to get the top ids and sort field values.
          2) counts are merged, and new "refinement" requests are send out for those terms in the top 10 where a count was not received from some shards. Also, for terms below the top 10, we calculate the maximum it could have based on shards we have not heard from, and if that boosts it into the top 10, we include that term for "refinement".
          3) refinement responses are used to adjust the counts, and we are done.

          Note that it is theoretically possible to miss terms. A term could be just below the threshold of each shard (and thus not returned by any shard), but the total count could boost it in the top. This could be rectified by retrieving all terms above a specified count, but it could be expensive. The counts that are currently returned are exact.

          Show
          Yonik Seeley added a comment - OK, this version patches cleanly and includes some distributed faceting code. facet.query and facet.field sorted by count is mostly handled breaking ties by natural (index) sort order is not yet implemented date faceting and unsorted (index order) facet.field is not implemented Assuming the user asks for the top 10 terms of a field: 1) The first facet queries piggyback on the queries to get the top ids and sort field values. 2) counts are merged, and new "refinement" requests are send out for those terms in the top 10 where a count was not received from some shards. Also, for terms below the top 10, we calculate the maximum it could have based on shards we have not heard from, and if that boosts it into the top 10, we include that term for "refinement". 3) refinement responses are used to adjust the counts, and we are done. Note that it is theoretically possible to miss terms. A term could be just below the threshold of each shard (and thus not returned by any shard), but the total count could boost it in the top. This could be rectified by retrieving all terms above a specified count, but it could be expensive. The counts that are currently returned are exact.
          Hide
          Yonik Seeley added a comment -

          WRT a switch, I left room for other components to insert stages between the well defined ones.
          I'm not sure if this will be useful in the future or not. Much of that seems like it would depend on the contracts between the components and the ResponseBuilder, and thus how other unknown custom coponents would be able to change things. That's still very immature, as I've really just been focusing on getting things working.

          Show
          Yonik Seeley added a comment - WRT a switch, I left room for other components to insert stages between the well defined ones. I'm not sure if this will be useful in the future or not. Much of that seems like it would depend on the contracts between the components and the ResponseBuilder, and thus how other unknown custom coponents would be able to change things. That's still very immature, as I've really just been focusing on getting things working.
          Hide
          patrick o'leary added a comment -

          Small thing but if you update org.apache.solr.handler.component.ResponseBuilder
          and set the stages to final, you can use a switch statement in the distributedProcess phase.

          public class ResponseBuilder 
          {
            public static final int STAGE_START           = 0;
            public static final int STAGE_PARSE_QUERY     = 1000;
            public static final int STAGE_EXECUTE_QUERY   = 2000;
            public static final int STAGE_GET_FIELDS      = 3000;
            public static final int STAGE_DONE            = Integer.MAX_VALUE;
          
          Show
          patrick o'leary added a comment - Small thing but if you update org.apache.solr.handler.component.ResponseBuilder and set the stages to final, you can use a switch statement in the distributedProcess phase. public class ResponseBuilder { public static final int STAGE_START = 0; public static final int STAGE_PARSE_QUERY = 1000; public static final int STAGE_EXECUTE_QUERY = 2000; public static final int STAGE_GET_FIELDS = 3000; public static final int STAGE_DONE = Integer .MAX_VALUE;
          Hide
          Ryan McKinley added a comment -

          yonik, if you say "go", I'll add SOLR-446

          Show
          Ryan McKinley added a comment - yonik, if you say "go", I'll add SOLR-446
          Hide
          patrick o'leary added a comment -

          Was missing a file from an svn add, so the patch I put in there misses out on SolrFieldSortedHitQueue
          I'll remove it to reduce confusion.

          Show
          patrick o'leary added a comment - Was missing a file from an svn add, so the patch I put in there misses out on SolrFieldSortedHitQueue I'll remove it to reduce confusion.
          Hide
          Yonik Seeley added a comment -

          I'm in the middle of implementing some distributed faceting... but I'll try to get a better patch the next time around.
          I think some of Ryan's suggestions are good (a separate patch to move SearchHandler, put solrj in core, implement ResponseWriter support for SolrJ objects).

          Show
          Yonik Seeley added a comment - I'm in the middle of implementing some distributed faceting... but I'll try to get a better patch the next time around. I think some of Ryan's suggestions are good (a separate patch to move SearchHandler, put solrj in core, implement ResponseWriter support for SolrJ objects).
          Hide
          patrick o'leary added a comment -

          This might help, merged the distributed & federated patchs with trunk last night, fixed the rejects. Appears to work.
          The only things not included are the distributed searcher unit tests from the previous patch. Only the deltas were in the patch, so I had no way to rebuild them.

          Hope this helps
          P

          Show
          patrick o'leary added a comment - This might help, merged the distributed & federated patchs with trunk last night, fixed the rejects. Appears to work. The only things not included are the distributed searcher unit tests from the previous patch. Only the deltas were in the patch, so I had no way to rebuild them. Hope this helps P
          Hide
          Gereon Steffens added a comment -

          Yonik - thanks, that's what caused it.

          Patrick - as far as I can tell, you can ignore the error messages from patch.

          Show
          Gereon Steffens added a comment - Yonik - thanks, that's what caused it. Patrick - as far as I can tell, you can ignore the error messages from patch.
          Hide
          patrick o'leary added a comment -

          Hey Yonik

          Are you applying the federated search patch first before the distributed search?
          The patch itself won't apply cleanly against trunk

          Thanks
          P

          Show
          patrick o'leary added a comment - Hey Yonik Are you applying the federated search patch first before the distributed search? The patch itself won't apply cleanly against trunk Thanks P
          Hide
          Yonik Seeley added a comment -

          There is currently no "local" shard... is that causing your problem?
          Use something like shards=localhost:8983/solr,localhost:8080/solr

          Show
          Yonik Seeley added a comment - There is currently no "local" shard... is that causing your problem? Use something like shards=localhost:8983/solr,localhost:8080/solr
          Hide
          Gereon Steffens added a comment -

          Yonik, no matter what I try, I keep getting exceptions when querying anything that uses shards.
          Is the correct query URL still what I've used in my previous comment?

          Excerpt from my logs:

          SEVERE: org.apache.solr.client.solrj.SolrServerException: Error executing query
                  at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)
          [...]
          Caused by: org.apache.solr.common.SolrException: /select
          
          /select
          
          request: http://localhost:8090/select?echoParams=explicit&q=id:1527426&start=0&rows=10&fsv=true&fl=id,score&isShard=true&wt=xml&version=2.2
                  at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
          
          Show
          Gereon Steffens added a comment - Yonik, no matter what I try, I keep getting exceptions when querying anything that uses shards. Is the correct query URL still what I've used in my previous comment? Excerpt from my logs: SEVERE: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) [...] Caused by: org.apache.solr.common.SolrException: /select /select request: http://localhost:8090/select?echoParams=explicit&q=id:1527426&start=0&rows=10&fsv=true&fl=id,score&isShard=true&wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
          Hide
          Yonik Seeley added a comment -

          Small update, mostly to sorting

          • This changes sorting to get values from the Sort comparators (thus supporting custom sorts)
          • uses external values that can be supported by XML, also nicer for debugging
          • returns sort field values in an array per-field {price=[10,20,30,40,50]}
          • merging should be faster... lookup of sort values is by index number instead of searching
            for the field name.
          • merging short-circuits comparisons for docs in the same shard
          • sorting null values now works & respects sortMissingFirst/Last, etc
          • if a shard request, don't pre-fetch docs for highlighter
          Show
          Yonik Seeley added a comment - Small update, mostly to sorting This changes sorting to get values from the Sort comparators (thus supporting custom sorts) uses external values that can be supported by XML, also nicer for debugging returns sort field values in an array per-field {price=[10,20,30,40,50]} merging should be faster... lookup of sort values is by index number instead of searching for the field name. merging short-circuits comparisons for docs in the same shard sorting null values now works & respects sortMissingFirst/Last, etc if a shard request, don't pre-fetch docs for highlighter
          Hide
          Otis Gospodnetic added a comment -

          Shard is what you call a small(er) index that is a part of a large(r) cluster of indices. These smaller shards together form one large logical index.

          See http://www.scribd.com/doc/312186/THE-GOOGLE-CLUSTER-ARCHITECTURE

          I wish Nutch used the same (shard) nomenclature instead of using "segments", so there is no confusion with Lucene index segments.... but that's another issue.

          Show
          Otis Gospodnetic added a comment - Shard is what you call a small(er) index that is a part of a large(r) cluster of indices. These smaller shards together form one large logical index. See http://www.scribd.com/doc/312186/THE-GOOGLE-CLUSTER-ARCHITECTURE I wish Nutch used the same (shard) nomenclature instead of using "segments", so there is no confusion with Lucene index segments.... but that's another issue.
          Hide
          Ryan McKinley added a comment -


          For #3, are there SolrJ parts (or future parts) that we wouldn't want automatically bundled with Solr?

          I don't think so. The thing I want to make sure is still possible is that solrj can be distributed independently (without the lucene dependencies)

          The existing artifact topology makes sense as is: common, solrj, core.

          Currently we have:

          + common
            + solrj
            + core
          

          we need

          + common
            + solrj  
                +core
          

          or

          + common & solrj  
            + core
          

          This issue is essentially independent of SOLR-303, but we should try to make our source directory structures consistent with standard practice.


          Saving an if() doesn't seem too compelling (the current code could certainly be refactored to be cleaner anyway). Are there other benefits to having a separate DistributedSearchHandler though?

          If there is a good reason to keep it the same handler then that is a reason enough.

          I just looked at it (without really grocking how it works) and it seemed a bit bloated with distribution lifecycle stuff. As long as the non-distributed request cycle isn't tied to the distributed stuff, I'm sure it is fine.


          BTW, where does the term "shard" come from? What specifically does it refer to?

          Show
          Ryan McKinley added a comment - For #3, are there SolrJ parts (or future parts) that we wouldn't want automatically bundled with Solr? I don't think so. The thing I want to make sure is still possible is that solrj can be distributed independently (without the lucene dependencies) The existing artifact topology makes sense as is: common, solrj, core. Currently we have: + common + solrj + core we need + common + solrj +core or + common & solrj + core This issue is essentially independent of SOLR-303 , but we should try to make our source directory structures consistent with standard practice. Saving an if() doesn't seem too compelling (the current code could certainly be refactored to be cleaner anyway). Are there other benefits to having a separate DistributedSearchHandler though? If there is a good reason to keep it the same handler then that is a reason enough. I just looked at it (without really grocking how it works) and it seemed a bit bloated with distribution lifecycle stuff. As long as the non-distributed request cycle isn't tied to the distributed stuff, I'm sure it is fine. BTW, where does the term "shard" come from? What specifically does it refer to?
          Hide
          Yonik Seeley added a comment -

          {quote}}We should extract out a few simple things and commit them quickly to make this go more smoothly:

          1. move SearchHandler to o.a.s.handler.component - I vote you go ahead and commit that change.
          2. Create a separate issue for adding SolrDocument to XMLWriter
          3. Move solrj into the main source tree. I'm not sure the best way to do this, but I don't think solrj should sit in its own source folder if the core depends on it.

          Definitely agree on #1 and #2.
          For #3, are there SolrJ parts (or future parts) that we wouldn't want automatically bundled with Solr?

          Is there a good reason to use the same handler for distributed search?

          It seems like a single search component should be able to handle distributed search.
          If that's the case, what separates a handler that is distributed and one that isn't?
          The first thing that occured to me was to just detect the presence of shards[] after the prepare phase.
          There is a side benefit in that a component can control whether a request is distributed or not (all solrconfig could be the same for systems in a cluster, with some sort of external system controlling topology).

          One could have a distributed handler that could delegate or handle non-distributed requests, but it seems to amount to the same thing (a single handler that can do both on the fly).

          Saving an if() doesn't seem too compelling (the current code could certainly be refactored to be cleaner anyway). Are there other benefits to having a separate DistributedSearchHandler though?
          .

          Show
          Yonik Seeley added a comment - {quote}}We should extract out a few simple things and commit them quickly to make this go more smoothly: 1. move SearchHandler to o.a.s.handler.component - I vote you go ahead and commit that change. 2. Create a separate issue for adding SolrDocument to XMLWriter 3. Move solrj into the main source tree. I'm not sure the best way to do this, but I don't think solrj should sit in its own source folder if the core depends on it. Definitely agree on #1 and #2. For #3, are there SolrJ parts (or future parts) that we wouldn't want automatically bundled with Solr? Is there a good reason to use the same handler for distributed search? It seems like a single search component should be able to handle distributed search. If that's the case, what separates a handler that is distributed and one that isn't? The first thing that occured to me was to just detect the presence of shards[] after the prepare phase. There is a side benefit in that a component can control whether a request is distributed or not (all solrconfig could be the same for systems in a cluster, with some sort of external system controlling topology). One could have a distributed handler that could delegate or handle non-distributed requests, but it seems to amount to the same thing (a single handler that can do both on the fly). Saving an if() doesn't seem too compelling (the current code could certainly be refactored to be cleaner anyway). Are there other benefits to having a separate DistributedSearchHandler though? .
          Hide
          Ryan McKinley added a comment -

          I just took a quick look... a few observations:

          We should extract out a few simple things and commit them quickly to make this go more smoothly:

          1. move SearchHandler to o.a.s.handler.component – I vote you go ahead and commit that change.
          2. Create a separate issue for adding SolrDocument to XMLWriter
          3. Move solrj into the main source tree. I'm not sure the best way to do this, but I don't think solrj should sit in its own source folder if the core depends on it.

          Is there a good reason to use the same handler for distributed search? Why not have a DistributedSearchHandler that extends SearchHandler and skip the if {} else {} checking? Likewise, I wonder if a DistributedResponseBuilder could/should extend ResponseBuilding and add the necessary logic.

          Show
          Ryan McKinley added a comment - I just took a quick look... a few observations: We should extract out a few simple things and commit them quickly to make this go more smoothly: move SearchHandler to o.a.s.handler.component – I vote you go ahead and commit that change. Create a separate issue for adding SolrDocument to XMLWriter Move solrj into the main source tree. I'm not sure the best way to do this, but I don't think solrj should sit in its own source folder if the core depends on it. Is there a good reason to use the same handler for distributed search? Why not have a DistributedSearchHandler that extends SearchHandler and skip the if {} else {} checking? Likewise, I wonder if a DistributedResponseBuilder could/should extend ResponseBuilding and add the necessary logic.
          Hide
          Stu Hood added a comment -

          Thanks for the new patch Yonik! It doesn't apply cleanly because of the way you generated the test files, but after those have been removed, it looks good. It seems you figured out the sorting issue that I had mentioned: thanks.

          Show
          Stu Hood added a comment - Thanks for the new patch Yonik! It doesn't apply cleanly because of the way you generated the test files, but after those have been removed, it looks good. It seems you figured out the sorting issue that I had mentioned: thanks.
          Hide
          Yonik Seeley added a comment -

          attaching updated patch (distributed.patch) that fixes some sorting issues.

          Show
          Yonik Seeley added a comment - attaching updated patch (distributed.patch) that fixes some sorting issues.
          Hide
          Yonik Seeley added a comment -

          OK, here is a draft that mostly works for searches and highlighting.

          There are stages in the request:

            public static int STAGE_START           = 0;
            public static int STAGE_PARSE_QUERY     = 1000;
            public static int STAGE_EXECUTE_QUERY   = 2000;
            public static int STAGE_GET_FIELDS      = 3000;
            public static int STAGE_DONE            = Integer.MAX_VALUE;
          

          When a component wants to send a request, it adds it to "outgoing" queue.
          Other components can inspect and modify these shard requests.
          All components get a callback when the shard response is received.

          All shard responses purposes (to aid in both correlation and inspection/modification by other components).
          This is what a ShardRequest looks like:

          public class ShardRequest {
            public final static String[] ALL_SHARDS = null;
          
            public final static int PURPOSE_PRIVATE         = 0x01;
            public final static int PURPOSE_GET_TERM_DFS    = 0x02;
            public final static int PURPOSE_GET_TOP_IDS     = 0x04;
            public final static int PURPOSE_REFINE_TOP_IDS  = 0x08;
            public final static int PURPOSE_GET_FACETS      = 0x10;
            public final static int PURPOSE_REFINE_FACETS   = 0x20;
            public final static int PURPOSE_GET_FIELDS      = 0x40;
            public final static int PURPOSE_GET_HIGHLIGHTS  = 0x80;
          
            public int purpose;  // the purpose of this request
          
            public String[] shards;  // the shards this request should be sent to
          // TODO: how to request a specific shard address?
          
            public ModifiableSolrParams params;
          
            public List<ShardResponse> responses = new ArrayList<ShardResponse>();
          }
          

          Components are responsible for themselves... the highlighting component is responsible for turning itself on/off at the appropriate time... the query component has no knowledge of the highlight component. This will make it so that custom components can be developed that can work in a distributed environment w/o explicit support for that component baked into the other components.

          Show
          Yonik Seeley added a comment - OK, here is a draft that mostly works for searches and highlighting. There are stages in the request: public static int STAGE_START = 0; public static int STAGE_PARSE_QUERY = 1000; public static int STAGE_EXECUTE_QUERY = 2000; public static int STAGE_GET_FIELDS = 3000; public static int STAGE_DONE = Integer .MAX_VALUE; When a component wants to send a request, it adds it to "outgoing" queue. Other components can inspect and modify these shard requests. All components get a callback when the shard response is received. All shard responses purposes (to aid in both correlation and inspection/modification by other components). This is what a ShardRequest looks like: public class ShardRequest { public final static String [] ALL_SHARDS = null ; public final static int PURPOSE_PRIVATE = 0x01; public final static int PURPOSE_GET_TERM_DFS = 0x02; public final static int PURPOSE_GET_TOP_IDS = 0x04; public final static int PURPOSE_REFINE_TOP_IDS = 0x08; public final static int PURPOSE_GET_FACETS = 0x10; public final static int PURPOSE_REFINE_FACETS = 0x20; public final static int PURPOSE_GET_FIELDS = 0x40; public final static int PURPOSE_GET_HIGHLIGHTS = 0x80; public int purpose; // the purpose of this request public String [] shards; // the shards this request should be sent to // TODO: how to request a specific shard address? public ModifiableSolrParams params; public List<ShardResponse> responses = new ArrayList<ShardResponse>(); } Components are responsible for themselves... the highlighting component is responsible for turning itself on/off at the appropriate time... the query component has no knowledge of the highlight component. This will make it so that custom components can be developed that can work in a distributed environment w/o explicit support for that component baked into the other components.
          Hide
          Yonik Seeley added a comment -

          I'm not quite sure about GlobalCollectionStat. Is its purpose just to normalize weights from the shards?

          It's to make a distributed search score the same as it would if everything was in a single index.
          idf (inverse document frequency) is part of the scoring, so that component essentially does a distributed idf.

          I still use the PriorityQueue, but it's been modified since SolrJ returns objects rather than strings.
          I'll try to post a draft soon... if you understood the old code, it will be great for you to look at the new stuff to see what I'm missing.

          Show
          Yonik Seeley added a comment - I'm not quite sure about GlobalCollectionStat. Is its purpose just to normalize weights from the shards? It's to make a distributed search score the same as it would if everything was in a single index. idf (inverse document frequency) is part of the scoring, so that component essentially does a distributed idf. I still use the PriorityQueue, but it's been modified since SolrJ returns objects rather than strings. I'll try to post a draft soon... if you understood the old code, it will be great for you to look at the new stuff to see what I'm missing.
          Hide
          Stu Hood added a comment -

          I recognize the advantage of the AuxiliaryQPhase, but I'm not quite sure about GlobalCollectionStat. Is its purpose just to normalize weights from the shards?

          I had to make some changes to the MainQPhase parameter building, and to the PriorityQueue that SearchResponseMerger uses to get sorting working properly. Yonik, if you aren't planning on re-writing those from scratch, would you prefer a patch, or an explanation of what I needed to change?

          Show
          Stu Hood added a comment - I recognize the advantage of the AuxiliaryQPhase, but I'm not quite sure about GlobalCollectionStat. Is its purpose just to normalize weights from the shards? I had to make some changes to the MainQPhase parameter building, and to the PriorityQueue that SearchResponseMerger uses to get sorting working properly. Yonik, if you aren't planning on re-writing those from scratch, would you prefer a patch, or an explanation of what I needed to change?
          Hide
          Yonik Seeley added a comment -

          Bear with me... I'm working on this from a bit of a different angle.

          • multiple stages, defined by components themselves, and a stage doesn't end until an outgoing request queue is empty.
          • making components responsible for turning on/off their own options in the query phases, rather than having the distributed search component have to know all the different options.
          • using SolrJ/HttpClient for communication
          • organizational: moved SearchHandler into the component package, along with distributed search stuff. It's all related and allows us to keep things private that should be kept private.

          I understand the original author is no longer involved with this issue, so I'm basing things on his code in some places, but not others. Hopefully I'll have something

          Show
          Yonik Seeley added a comment - Bear with me... I'm working on this from a bit of a different angle. multiple stages, defined by components themselves, and a stage doesn't end until an outgoing request queue is empty. making components responsible for turning on/off their own options in the query phases, rather than having the distributed search component have to know all the different options. using SolrJ/HttpClient for communication organizational: moved SearchHandler into the component package, along with distributed search stuff. It's all related and allows us to keep things private that should be kept private. I understand the original author is no longer involved with this issue, so I'm basing things on his code in some places, but not others. Hopefully I'll have something
          Hide
          Gereon Steffens added a comment -

          I started experimenting with this patch and have a couple of issues.

          First, the patch did not apply cleanly to the latest trunk (603869), so I reverted to 600419 - no big deal.

          I then set up two separate tomcat/solr instances using identical schemas (on ports 8080 and 8090) and tried querying both using solr/search requests and can't any of my queries to work.

          For example, there is a document with field "id" = 1527426 in the database on port 8090. "id" is defined as a "sint" field. The 8080 instance has no such id.
          When querying "http://localhost/8080/solr/search?q=id:1527426&shards=local,localhost:8090/solr", I get the following in the tomcat logs:

          catalina.out on the 8080 instance:
          
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.component.ResponseBuilder <init>
          INFO: ### *** shards len 2
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: --------Extract terms starting----------- :
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: ### *** is shards null false
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: ### *** SHARDS len 2
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8090/solr/select?q=id%3A1527426&shards=local%2Clocalhost%3A8090%2Fsolr&eqt=true&
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent execute
          WARNING: Exception while querying shard localhost:8090/solr :java.lang.NullPointerException
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent calcuateGlobalCollectionStat
          INFO: --------getGlobalCollectionStat starting----------- :
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8090/solr/federated/collectionstats?terms=id%3A%C2%80%C5%B4%E0%BA%82%2C&
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values nd : java.lang.Integer
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values tdf : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process
          INFO: --------MainQPhaseComponent starting----------- :
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal
          INFO: ->Local request params: {fl=id,score,,q=id:1527426,nd=74621,tdf=id:Ŵຂ@1,,fsv=true}
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.search.DocSlice
          Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute
          INFO: null nd=74621&fsv=true&tdf=id:Ŵຂ@1,&q=id:1527426&fl=id,score, 0 1
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8090/solr/select?fl=id%2Cscore%2C&q=id%3A1527426&nd=74621&tdf=id%3A%C2%80%C5%B4%E0%BA%82%401%2C&fsv=true&
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process
          WARNING: Exception while querying shard localhost:8090/solr :java.lang.ClassCastException: java.lang.Integer
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent process
          INFO: --------AuxiliaryQPhaseComponent starting----------- :
          Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute
          INFO: /search q=id:1527426&shards=local,localhost:8090/solr 0 60
          
          catalina.out on the 8090 instance
          
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 10:55:04 AM org.apache.solr.handler.component.ResponseBuilder <init>
          INFO: ### *** shards len 2
          Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute
          INFO: /select q=id:1527426&eqt=true&shards=local,localhost:8090/solr 0 3
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values nd : java.lang.Integer
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values tdf : org.apache.solr.common.util.NamedList
          Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute
          INFO: /federated/collectionstats terms=id:Ŵຂ, 0 3
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.search.DocSlice
          Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute
          INFO: /select nd=74621&fsv=true&fl=id,score,&q=id:1527426&tdf=id:Ŵຂ@1, 0 1
          

          So the request does reach the 8090 instance, but triggers a CastException on the 8080 instance. The XML output is

          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">135</int>
            <lst name="params">
              <str name="q">id:1527426</str>
              <str name="shards">local,localhost:8090/solr</str>
            </lst>
          </lst>
          <result name="response" numFound="0" start="0"/>
            <lst name="responseHeader">
              <lst name="local">
                <int name="status">0</int>
                <int name="QTime">4</int>
                <lst name="params">
                <str name="nd">74621</str>
                <str name="fsv">true</str>
                <str name="tdf">id:€Ŵຂ@1,</str>
                <str name="q">id:1527426</str>
                <str name="fl">id,score,</str>
              </lst>
            </lst>
          </lst>
          </response>
          

          The "reverse" request for "http://localhost:8090/solr/search?q=id:1527426&shards=local,localhost:8080/solr" produces an HTTP Status 500 - null java.lang.NullPointerException response, the logs are:

          catalina.out on the 8080 instance
          
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.component.ResponseBuilder <init>
          INFO: ### *** shards len 2
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: /select q=id:1527426&eqt=true&shards=local,localhost:8080/solr 0 2
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values nd : java.lang.Integer
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values tdf : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: /federated/collectionstats terms=id:Ŵຂ, 0 5
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.search.DocSlice
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: /select nd=74621&fsv=true&fl=id,score,&q=id:1527426&tdf=id:Ŵຂ@1, 0 1
          
          catalina.out on the 8090 instance
          
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.component.ResponseBuilder <init>
          INFO: ### *** shards len 2
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: --------Extract terms starting----------- :
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: ### *** is shards null false
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms
          INFO: ### *** SHARDS len 2
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8080/solr/select?q=id%3A1527426&shards=local%2Clocalhost%3A8080%2Fsolr&eqt=true&
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent execute
          WARNING: Exception while querying shard localhost:8080/solr :java.lang.NullPointerException
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent calcuateGlobalCollectionStat
          INFO: --------getGlobalCollectionStat starting----------- :
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8080/solr/federated/collectionstats?terms=id%3A%C2%80%C5%B4%E0%BA%82%2C&
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values nd : java.lang.Integer
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values tdf : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process
          INFO: --------MainQPhaseComponent starting----------- :
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse
          INFO: ->Request http://localhost:8080/solr/select?fl=id%2Cscore%2C&q=id%3A1527426&nd=74621&tdf=id%3A%C2%80%C5%B4%E0%BA%82%401%2C&fsv=true&
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal
          INFO: ->Local request params: {fl=id,score,,q=id:1527426,nd=74621,tdf=id:Ŵຂ@1,,fsv=true}
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.search.DocSlice
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: null nd=74621&fsv=true&tdf=id:Ŵຂ@1,&q=id:1527426&fl=id,score, 0 4
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent process
          INFO: --------AuxiliaryQPhaseComponent starting----------- :
          Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal
          INFO: ->Local request params: {dq=id:"Ŵຂ" ,q=id:1527426}
          Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add
          INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap
          Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log
          SEVERE: java.lang.NumberFormatException: For input string: "Ŵຂ"
                  at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
                  at java.lang.Integer.parseInt(Integer.java:447)
                  at java.lang.Integer.parseInt(Integer.java:497)
                  at org.apache.solr.util.NumberUtils.int2sortableStr(NumberUtils.java:36)
                  at org.apache.solr.schema.SortableIntField.toInternal(SortableIntField.java:52)
                  at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315)
                  at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:437)
                  at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:97)
                  at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:515)
                  at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1227)
                  at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979)
                  at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907)
                  at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896)
                  at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146)
                  at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:101)
                  at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.prepare(AuxiliaryQPhaseComponent.java:71)
                  at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:152)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:866)
                  at org.apache.solr.handler.federated.component.FedSearchComponent.executeOnLocal(FedSearchComponent.java:87)
                  at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent$1.call(AuxiliaryQPhaseComponent.java:115)
                  at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent$1.call(AuxiliaryQPhaseComponent.java:114)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:123)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:123)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                  at java.lang.Thread.run(Thread.java:595)
          
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: null q=id:1527426&dq=id:"Ŵຂ"+ 0 2
          Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log
          SEVERE: java.lang.NullPointerException
                  at org.apache.solr.handler.federated.SearchResponseMerger.mergeResponseDocs_NoSort(SearchResponseMerger.java:215)
                  at org.apache.solr.handler.federated.SearchResponseMerger.merge(SearchResponseMerger.java:83)
                  at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.process(AuxiliaryQPhaseComponent.java:156)
                  at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:158)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:866)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
                  at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
                  at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
                  at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
                  at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
                  at java.lang.Thread.run(Thread.java:595)
          
          Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute
          INFO: /search q=id:1527426&shards=local,localhost:8080/solr 0 95
          Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log
          SEVERE: java.lang.NullPointerException
                  at org.apache.solr.handler.federated.SearchResponseMerger.mergeResponseDocs_NoSort(SearchResponseMerger.java:215)
                  at org.apache.solr.handler.federated.SearchResponseMerger.merge(SearchResponseMerger.java:83)
                  at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.process(AuxiliaryQPhaseComponent.java:156)
                  at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:158)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:866)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
                  at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
                  at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
                  at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
                  at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
                  at java.lang.Thread.run(Thread.java:595)
          
          
          Show
          Gereon Steffens added a comment - I started experimenting with this patch and have a couple of issues. First, the patch did not apply cleanly to the latest trunk (603869), so I reverted to 600419 - no big deal. I then set up two separate tomcat/solr instances using identical schemas (on ports 8080 and 8090) and tried querying both using solr/search requests and can't any of my queries to work. For example, there is a document with field "id" = 1527426 in the database on port 8090. "id" is defined as a "sint" field. The 8080 instance has no such id. When querying "http://localhost/8080/solr/search?q=id:1527426&shards=local,localhost:8090/solr", I get the following in the tomcat logs: catalina.out on the 8080 instance: Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 10:55:04 AM org.apache.solr.handler.component.ResponseBuilder <init> INFO: ### *** shards len 2 Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: --------Extract terms starting----------- : Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: ### *** is shards null false Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: ### *** SHARDS len 2 Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8090/solr/select?q=id%3A1527426&shards=local%2Clocalhost%3A8090%2Fsolr&eqt=true& Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent execute WARNING: Exception while querying shard localhost:8090/solr :java.lang.NullPointerException Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent calcuateGlobalCollectionStat INFO: --------getGlobalCollectionStat starting----------- : Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8090/solr/federated/collectionstats?terms=id%3A%C2%80%C5%B4%E0%BA%82%2C& Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values nd : java.lang.Integer Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values tdf : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process INFO: --------MainQPhaseComponent starting----------- : Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal INFO: ->Local request params: {fl=id,score,,q=id:1527426,nd=74621,tdf=id:Ŵຂ@1,,fsv=true} Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.search.DocSlice Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute INFO: null nd=74621&fsv=true&tdf=id:Ŵຂ@1,&q=id:1527426&fl=id,score, 0 1 Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8090/solr/select?fl=id%2Cscore%2C&q=id%3A1527426&nd=74621&tdf=id%3A%C2%80%C5%B4%E0%BA%82%401%2C&fsv=true& Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process WARNING: Exception while querying shard localhost:8090/solr :java.lang.ClassCastException: java.lang.Integer Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent process INFO: --------AuxiliaryQPhaseComponent starting----------- : Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute INFO: /search q=id:1527426&shards=local,localhost:8090/solr 0 60 catalina.out on the 8090 instance Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 10:55:04 AM org.apache.solr.handler.component.ResponseBuilder <init> INFO: ### *** shards len 2 Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute INFO: /select q=id:1527426&eqt=true&shards=local,localhost:8090/solr 0 3 Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values nd : java.lang.Integer Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values tdf : org.apache.solr.common.util.NamedList Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute INFO: /federated/collectionstats terms=id:Ŵຂ, 0 3 Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 10:55:04 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.search.DocSlice Dec 13, 2007 10:55:04 AM org.apache.solr.core.SolrCore execute INFO: /select nd=74621&fsv=true&fl=id,score,&q=id:1527426&tdf=id:Ŵຂ@1, 0 1 So the request does reach the 8090 instance, but triggers a CastException on the 8080 instance. The XML output is <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">135</int> <lst name="params"> <str name="q">id:1527426</str> <str name="shards">local,localhost:8090/solr</str> </lst> </lst> <result name="response" numFound="0" start="0"/> <lst name="responseHeader"> <lst name="local"> <int name="status">0</int> <int name="QTime">4</int> <lst name="params"> <str name="nd">74621</str> <str name="fsv">true</str> <str name="tdf">id:€Ŵຂ@1,</str> <str name="q">id:1527426</str> <str name="fl">id,score,</str> </lst> </lst> </lst> </response> The "reverse" request for "http://localhost:8090/solr/search?q=id:1527426&shards=local,localhost:8080/solr" produces an HTTP Status 500 - null java.lang.NullPointerException response, the logs are: catalina.out on the 8080 instance Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.handler.component.ResponseBuilder <init> INFO: ### *** shards len 2 Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: /select q=id:1527426&eqt=true&shards=local,localhost:8080/solr 0 2 Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values nd : java.lang.Integer Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values tdf : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: /federated/collectionstats terms=id:Ŵຂ, 0 5 Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.search.DocSlice Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: /select nd=74621&fsv=true&fl=id,score,&q=id:1527426&tdf=id:Ŵຂ@1, 0 1 catalina.out on the 8090 instance Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.handler.component.ResponseBuilder <init> INFO: ### *** shards len 2 Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: --------Extract terms starting----------- : Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: ### *** is shards null false Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent extractTerms INFO: ### *** SHARDS len 2 Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8080/solr/select?q=id%3A1527426&shards=local%2Clocalhost%3A8080%2Fsolr&eqt=true& Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent execute WARNING: Exception while querying shard localhost:8080/solr :java.lang.NullPointerException Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.GlobalCollectionStatComponent calcuateGlobalCollectionStat INFO: --------getGlobalCollectionStat starting----------- : Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8080/solr/federated/collectionstats?terms=id%3A%C2%80%C5%B4%E0%BA%82%2C& Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values nd : java.lang.Integer Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values tdf : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.MainQPhaseComponent process INFO: --------MainQPhaseComponent starting----------- : Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.XMLResponseParser parse INFO: ->Request http://localhost:8080/solr/select?fl=id%2Cscore%2C&q=id%3A1527426&nd=74621&tdf=id%3A%C2%80%C5%B4%E0%BA%82%401%2C&fsv=true& Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal INFO: ->Local request params: {fl=id,score,,q=id:1527426,nd=74621,tdf=id:Ŵຂ@1,,fsv=true} Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.search.DocSlice Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: null nd=74621&fsv=true&tdf=id:Ŵຂ@1,&q=id:1527426&fl=id,score, 0 4 Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values response : org.apache.solr.handler.federated.ResponseDocs Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.NamedList Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent process INFO: --------AuxiliaryQPhaseComponent starting----------- : Dec 13, 2007 11:07:33 AM org.apache.solr.handler.federated.component.FedSearchComponent executeOnLocal INFO: ->Local request params: {dq=id:"Ŵຂ" ,q=id:1527426} Dec 13, 2007 11:07:33 AM org.apache.solr.request.SolrQueryResponse add INFO: adding into values responseHeader : org.apache.solr.common.util.SimpleOrderedMap Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NumberFormatException: For input string: "Ŵຂ" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:447) at java.lang.Integer.parseInt(Integer.java:497) at org.apache.solr.util.NumberUtils.int2sortableStr(NumberUtils.java:36) at org.apache.solr.schema.SortableIntField.toInternal(SortableIntField.java:52) at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:437) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:97) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:515) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1227) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:979) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:907) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:896) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:146) at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:101) at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.prepare(AuxiliaryQPhaseComponent.java:71) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:152) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:866) at org.apache.solr.handler.federated.component.FedSearchComponent.executeOnLocal(FedSearchComponent.java:87) at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent$1.call(AuxiliaryQPhaseComponent.java:115) at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent$1.call(AuxiliaryQPhaseComponent.java:114) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: null q=id:1527426&dq=id:"Ŵຂ"+ 0 2 Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.federated.SearchResponseMerger.mergeResponseDocs_NoSort(SearchResponseMerger.java:215) at org.apache.solr.handler.federated.SearchResponseMerger.merge(SearchResponseMerger.java:83) at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.process(AuxiliaryQPhaseComponent.java:156) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:866) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:595) Dec 13, 2007 11:07:33 AM org.apache.solr.core.SolrCore execute INFO: /search q=id:1527426&shards=local,localhost:8080/solr 0 95 Dec 13, 2007 11:07:33 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.federated.SearchResponseMerger.mergeResponseDocs_NoSort(SearchResponseMerger.java:215) at org.apache.solr.handler.federated.SearchResponseMerger.merge(SearchResponseMerger.java:83) at org.apache.solr.handler.federated.component.AuxiliaryQPhaseComponent.process(AuxiliaryQPhaseComponent.java:156) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:866) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:595)
          Hide
          Sabyasachi Dalal added a comment -

          Removed the commented line from SolrCore.loadSearchComponents and couple of debug statements.

          Show
          Sabyasachi Dalal added a comment - Removed the commented line from SolrCore.loadSearchComponents and couple of debug statements.
          Hide
          Sabyasachi Dalal added a comment -

          I made a mistake and uploaded the wrong patch file. Now uploading the correct file.

          I have fixed and updated the patch with trunk version 600419. It is integrated with the re-opened SOLR-281 patch.
          I have added the configuration for the three distributed-search components in the solrconfig.xml, under "/search" request handler. So, the distributed search works with /search request only.

          Couple of issues :
          1. The dist search components need the reference to the SearchHandler. So for now , i have hard coded the "/search" pattern in the FedSearchComponent.
          2. Need a clean way to load common init params for the dist search components, such as timeout, thread pool size and search handler pattern.

          Show
          Sabyasachi Dalal added a comment - I made a mistake and uploaded the wrong patch file. Now uploading the correct file. I have fixed and updated the patch with trunk version 600419. It is integrated with the re-opened SOLR-281 patch. I have added the configuration for the three distributed-search components in the solrconfig.xml, under "/search" request handler. So, the distributed search works with /search request only. Couple of issues : 1. The dist search components need the reference to the SearchHandler. So for now , i have hard coded the "/search" pattern in the FedSearchComponent. 2. Need a clean way to load common init params for the dist search components, such as timeout, thread pool size and search handler pattern.
          Hide
          Sabyasachi Dalal added a comment -

          I fixed the issue with the patch and it works with version 594268.
          Now, i am trying to make it work with the latest trunk. I am facing a problem. The FedSearchComponent needs a handle to the "handler" in order to execute on the local shard. I am trying to figure out how to pass the handler during component initialization.

          Show
          Sabyasachi Dalal added a comment - I fixed the issue with the patch and it works with version 594268. Now, i am trying to make it work with the latest trunk. I am facing a problem. The FedSearchComponent needs a handle to the "handler" in order to execute on the local shard. I am trying to figure out how to pass the handler during component initialization.
          Hide
          Yonik Seeley added a comment -

          Yes, I'm suggesting changing the main control loop.
          Normal non-distributed requests don't necessarily need stages (but could be added to be more consistent with the distributed methods... with stages, I don't think there would be a "prepare" method).
          Right now, my private copy of SearchComponent looks like

          public abstract class SearchComponent implements SolrInfoMBean
          {
            public abstract void prepare( SolrQueryRequest req, SolrQueryResponse rsp ) throws IOException, ParseException;
            public abstract void process( SolrQueryRequest req, SolrQueryResponse rsp ) throws IOException;
          
            public int distributedProcess(ResponseBuilder rb) throws IOException {
              return ResponseBuilder.STAGE_END;
            }
          
            public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {
            }
          
          Show
          Yonik Seeley added a comment - Yes, I'm suggesting changing the main control loop. Normal non-distributed requests don't necessarily need stages (but could be added to be more consistent with the distributed methods... with stages, I don't think there would be a "prepare" method). Right now, my private copy of SearchComponent looks like public abstract class SearchComponent implements SolrInfoMBean { public abstract void prepare( SolrQueryRequest req, SolrQueryResponse rsp ) throws IOException, ParseException; public abstract void process( SolrQueryRequest req, SolrQueryResponse rsp ) throws IOException; public int distributedProcess(ResponseBuilder rb) throws IOException { return ResponseBuilder.STAGE_END; } public void handleResponses(ResponseBuilder rb, ShardRequest sreq) { }
          Hide
          Ryan McKinley added a comment -

          Are you suggesting changing the main control loop from:

                for( SearchComponent c : components ) {
                  c.process( req, rsp );
                }
          

          to something that knows "stages"?

          Or are you discussing something that would happen within a single 'c.process( req, rsp );?

          Show
          Ryan McKinley added a comment - Are you suggesting changing the main control loop from: for ( SearchComponent c : components ) { c.process( req, rsp ); } to something that knows "stages"? Or are you discussing something that would happen within a single 'c.process( req, rsp );?
          Hide
          Yonik Seeley added a comment -

          I've been prototyping distributed search in python...
          The current methods I have for a component are something like

            // returns the current stage this component is at... stage starts at -1 and the next stage is the minimum returned
            // by all components on the previous calls to process()
            int process(RequestBuilder rb, int stage);
          
             // callback for a single response received (optional... this could be left out)
             // all components have this called, regardless of who queued the request
             void singleResponse(ResponseBuilder rb, int stage, Request req, Response rsp);
          
             // callback when all responses (from all shards) to a request have been received
             void allResponses(ResponseBuilder rb, int stage, Request req);
          

          Any of these methods can add another request to the outgoing queue. The current stage is only over after all
          requests have been sent, responses received, and the outgoing queue is empty.
          When all components return maxint from process(), we are done.

          Show
          Yonik Seeley added a comment - I've been prototyping distributed search in python... The current methods I have for a component are something like // returns the current stage this component is at... stage starts at -1 and the next stage is the minimum returned // by all components on the previous calls to process() int process(RequestBuilder rb, int stage); // callback for a single response received (optional... this could be left out) // all components have this called, regardless of who queued the request void singleResponse(ResponseBuilder rb, int stage, Request req, Response rsp); // callback when all responses (from all shards) to a request have been received void allResponses(ResponseBuilder rb, int stage, Request req); Any of these methods can add another request to the outgoing queue. The current stage is only over after all requests have been sent, responses received, and the outgoing queue is empty. When all components return maxint from process(), we are done.
          Hide
          Yonik Seeley added a comment -

          It doesn't seem like there is any request handler set up that references the distributed search components.

          Show
          Yonik Seeley added a comment - It doesn't seem like there is any request handler set up that references the distributed search components.
          Hide
          Sabyasachi Dalal added a comment -

          Can you please some more details about the error ? Are you seeing
          any exceptions ? How are your partitions set up and what is the
          request you are sending ?

          Show
          Sabyasachi Dalal added a comment - Can you please some more details about the error ? Are you seeing any exceptions ? How are your partitions set up and what is the request you are sending ?
          Hide
          zhang.zuxin added a comment -

          to Sabyasachi Dalal:
          I update solr trunk to version 597284. And I patch it cleanly.But it does't work,just like it doesn't support distributed search.
          Alternately,it works when I used Sharad Agarwal 's patch.I don't know what's wrong, or maybe you change anything?

          Show
          zhang.zuxin added a comment - to Sabyasachi Dalal: I update solr trunk to version 597284. And I patch it cleanly.But it does't work,just like it doesn't support distributed search. Alternately,it works when I used Sharad Agarwal 's patch.I don't know what's wrong, or maybe you change anything?
          Hide
          Yonik Seeley added a comment -

          Original description by Sharad, moved to this comment because a JIRA "Description" is sent to the email list every time there is an update to the issue.

          Motivated by http://wiki.apache.org/solr/DistributedSearch
          "Index view consistency between multiple requests" requirement is relaxed in this implementation.

          Does the federated search query side. Update not yet done.

          Tries to achieve:-
          ------------------------

          • The client applications are totally agnostic to federated search. The federated search and merging of results are totally behind the scene in Solr in request handler . Response format remains the same after merging of results.
            The response from individual shard is deserialized into SolrQueryResponse object. The collection of SolrQueryResponse objects are merged to produce a single SolrQueryResponse object. This enables to use the Response writers as it is; or with minimal change.
          • Efficient query processing with highlighting and fields getting generated only for merged documents. The query is executed in 2 phases. First phase gets the doc unique keys with sort criteria. Second phase brings all requested fields and highlighting information. This saves lot of CPU in case there are good number of shards and highlighting info is requested.
            Should be easy to customize the query execution. For example: user can specify to execute query in just 1 phase itself. (For some queries when highlighting info is not required and number of fields requested are small; this can be more efficient.)
          • Ability to easily overwrite the default Federated capability by appropriate plugins and request parameters. As federated search is performed by the RequestHandler itself, multiple request handlers can easily be pre-configured with different federated search settings in solrconfig.xml
          • Global weight calculation is done by querying the terms' doc frequencies from all shards.
          • Federated search works on Http transport. So individual shard's VIP can be queried. Load-balancing and Fail-over taken care by VIP as usual.

          -Sub-searcher response parsing as a plugin interface. Different implementation could be written based on JSON, xml SAX etc. Current one based on XML DOM.

          HOW:
          -------
          A new RequestHandler called MultiSearchRequestHandler does the federated search on multiple sub-searchers, (referred as "shards" going forward). It extends the RequestHandlerBase. handleRequestBody method in RequestHandlerBase has been divided into query building and execute methods. This has been done to calculate global numDocs and docFreqs; and execute the query efficiently on multiple shards.
          All the "search" request handlers are expected to extend MultiSearchRequestHandler class in order to enable federated capability for the handler. StandardRequestHandler and DisMaxRequestHandler have been changed to extend this class.

          The federated search kicks in if "shards" is present in the request parameter. Otherwise search is performed as usual on the local index. eg. shards=local,host1:port1,host2:port2 will search on the local index and 2 remote indexes. The search response from all 3 shards are merged and serviced back to the client.

          The search request processing on the set of shards is performed as follows:

          STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs are calculated by requesting all the shards and adding up numDocs and docFreqs from each shard.

          STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs are passed as request parameters. All document fields are NOT requested, only document uniqFields and sort fields are requested. MoreLikeThis and Highlighting information are NOT requested.

          STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" and "rows" params. Merged doc uniqField and sort fields are collected. Other information like facet and debug is also merged.

          STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped based on shards. All shards in the grouping are queried for the merged doc uniqFields (from FirstQueryPhase), highlighting and moreLikeThis info.

          STEP 5: Responses from all shards from SecondQueryPhase are merged.

          STEP 6: Document fields , highlighting and moreLikeThis info from SecondQueryPhase are merged into FirstQueryPhase response.

          TODO:
          -Support sort field other than default score
          -Support ResponseDocs in writers other than XMLWriter
          -Http connection timeouts

          OPEN ISSUES;
          -Merging of facets by "top n terms of field f"

          Scope for Performance optimization:-
          -Search shards in parallel threads
          -Http connection Keep-Alive ?
          -Cache global numDocs and docFreqs
          -Cache Query objects in handlers ??

          Would appreciate feedback on my approach. I understand that there would be lot things I might have over-looked.

          Show
          Yonik Seeley added a comment - Original description by Sharad, moved to this comment because a JIRA "Description" is sent to the email list every time there is an update to the issue. Motivated by http://wiki.apache.org/solr/DistributedSearch "Index view consistency between multiple requests" requirement is relaxed in this implementation. Does the federated search query side. Update not yet done. Tries to achieve:- ------------------------ The client applications are totally agnostic to federated search. The federated search and merging of results are totally behind the scene in Solr in request handler . Response format remains the same after merging of results. The response from individual shard is deserialized into SolrQueryResponse object. The collection of SolrQueryResponse objects are merged to produce a single SolrQueryResponse object. This enables to use the Response writers as it is; or with minimal change. Efficient query processing with highlighting and fields getting generated only for merged documents. The query is executed in 2 phases. First phase gets the doc unique keys with sort criteria. Second phase brings all requested fields and highlighting information. This saves lot of CPU in case there are good number of shards and highlighting info is requested. Should be easy to customize the query execution. For example: user can specify to execute query in just 1 phase itself. (For some queries when highlighting info is not required and number of fields requested are small; this can be more efficient.) Ability to easily overwrite the default Federated capability by appropriate plugins and request parameters. As federated search is performed by the RequestHandler itself, multiple request handlers can easily be pre-configured with different federated search settings in solrconfig.xml Global weight calculation is done by querying the terms' doc frequencies from all shards. Federated search works on Http transport. So individual shard's VIP can be queried. Load-balancing and Fail-over taken care by VIP as usual. -Sub-searcher response parsing as a plugin interface. Different implementation could be written based on JSON, xml SAX etc. Current one based on XML DOM. HOW: ------- A new RequestHandler called MultiSearchRequestHandler does the federated search on multiple sub-searchers, (referred as "shards" going forward). It extends the RequestHandlerBase. handleRequestBody method in RequestHandlerBase has been divided into query building and execute methods. This has been done to calculate global numDocs and docFreqs; and execute the query efficiently on multiple shards. All the "search" request handlers are expected to extend MultiSearchRequestHandler class in order to enable federated capability for the handler. StandardRequestHandler and DisMaxRequestHandler have been changed to extend this class. The federated search kicks in if "shards" is present in the request parameter. Otherwise search is performed as usual on the local index. eg. shards=local,host1:port1,host2:port2 will search on the local index and 2 remote indexes. The search response from all 3 shards are merged and serviced back to the client. The search request processing on the set of shards is performed as follows: STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs are calculated by requesting all the shards and adding up numDocs and docFreqs from each shard. STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs are passed as request parameters. All document fields are NOT requested, only document uniqFields and sort fields are requested. MoreLikeThis and Highlighting information are NOT requested. STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" and "rows" params. Merged doc uniqField and sort fields are collected. Other information like facet and debug is also merged. STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped based on shards. All shards in the grouping are queried for the merged doc uniqFields (from FirstQueryPhase), highlighting and moreLikeThis info. STEP 5: Responses from all shards from SecondQueryPhase are merged. STEP 6: Document fields , highlighting and moreLikeThis info from SecondQueryPhase are merged into FirstQueryPhase response. TODO: -Support sort field other than default score -Support ResponseDocs in writers other than XMLWriter -Http connection timeouts OPEN ISSUES; -Merging of facets by "top n terms of field f" Scope for Performance optimization:- -Search shards in parallel threads -Http connection Keep-Alive ? -Cache global numDocs and docFreqs -Cache Query objects in handlers ?? Would appreciate feedback on my approach. I understand that there would be lot things I might have over-looked.
          Hide
          Hoss Man added a comment -

          Note: there has been discussion recently about the terminology distinction between "federated search" and "distributed search" (which ken recently updated on the wiki) ... this issue is tracking "distributed search" and not "federated search" correct?

          if so, the issue summary should be updated

          http://wiki.apache.org/solr/FederatedSearch
          http://wiki.apache.org/solr/DistributedSearch

          Show
          Hoss Man added a comment - Note: there has been discussion recently about the terminology distinction between "federated search" and "distributed search" (which ken recently updated on the wiki) ... this issue is tracking "distributed search" and not "federated search" correct? if so, the issue summary should be updated http://wiki.apache.org/solr/FederatedSearch http://wiki.apache.org/solr/DistributedSearch
          Hide
          Yonik Seeley added a comment -

          I'm really just starting to dig into this again, but here are a couple of thoughts:

          It looks like there is a monolithic main federated query component that does all the work... It would be nice if there were a way to turn this around so that a user could write a query component that could participate in a distributed search call. It seems like query info should be able to be gathered from multiple components and then a single request to a shard could be made. This entails multiple methods on QueryComponent for use in a distributed request.

          Another observation is that the number of "phases" may be unpredictable. For example when faceting, if one wants "exact" results, more information may be required from certain nodes. This means that components need a way to say if they are done or not, and a way to send different requests to different shards. Then when responses are received, it should be possible to optionally handle them one-by-one as they come in, or alternately all at once to merge the results.

          Show
          Yonik Seeley added a comment - I'm really just starting to dig into this again, but here are a couple of thoughts: It looks like there is a monolithic main federated query component that does all the work... It would be nice if there were a way to turn this around so that a user could write a query component that could participate in a distributed search call. It seems like query info should be able to be gathered from multiple components and then a single request to a shard could be made. This entails multiple methods on QueryComponent for use in a distributed request. Another observation is that the number of "phases" may be unpredictable. For example when faceting, if one wants "exact" results, more information may be required from certain nodes. This means that components need a way to say if they are done or not, and a way to send different requests to different shards. Then when responses are received, it should be possible to optionally handle them one-by-one as they come in, or alternately all at once to merge the results.
          Hide
          Sabyasachi Dalal added a comment -

          I mean i removed the files pertaining to 281. If you follow the development above, the files pertaining to 281 were added to this patch to make it easier to apply this patch.

          Show
          Sabyasachi Dalal added a comment - I mean i removed the files pertaining to 281. If you follow the development above, the files pertaining to 281 were added to this patch to make it easier to apply this patch.
          Hide
          Sabyasachi Dalal added a comment -

          I have updated the patch to remove the code pertaining to SOLR-281, because 281 has been committed.

          Show
          Sabyasachi Dalal added a comment - I have updated the patch to remove the code pertaining to SOLR-281 , because 281 has been committed.
          Hide
          Stu Hood added a comment -

          I'm still working on wrapping my head around the fedsearch phases, but I noticed the following stacktrace showing up in the logs every now and then:

          SEVERE: java.lang.NullPointerException
                  at org.apache.solr.handler.federated.component.GlobalCollectionStatComponent.prepare(GlobalCollectionStatComponent.java:81)
                  at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:116)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:807)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
                  at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
                  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
                  at java.lang.Thread.run(Thread.java:619)
          

          ... that is probably caused by the following statements around line 81 in GlobalCollectionStatComponent.prepare. We only enter the if statement if terms is null, and then we dereference it...

              String terms = req.getParams().get(ResponseBuilder.DOCFREQS);
              if (numDocs != null && terms == null) {
                // the build query has to be over-written to take into
                //account global numDocs and docFreqs
          
                //extract the numDocs and docFreqs from request params
                Map<Term, Integer> dfMap = new HashMap<Term, Integer>();
                String[] strTerms = terms.split(",");
          
          Show
          Stu Hood added a comment - I'm still working on wrapping my head around the fedsearch phases, but I noticed the following stacktrace showing up in the logs every now and then: SEVERE: java.lang.NullPointerException at org.apache.solr.handler.federated.component.GlobalCollectionStatComponent.prepare(GlobalCollectionStatComponent.java:81) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:116) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78) at org.apache.solr.core.SolrCore.execute(SolrCore.java:807) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) ... that is probably caused by the following statements around line 81 in GlobalCollectionStatComponent.prepare. We only enter the if statement if terms is null, and then we dereference it... String terms = req.getParams().get(ResponseBuilder.DOCFREQS); if (numDocs != null && terms == null ) { // the build query has to be over-written to take into //account global numDocs and docFreqs //extract the numDocs and docFreqs from request params Map<Term, Integer > dfMap = new HashMap<Term, Integer >(); String [] strTerms = terms.split( "," );
          Hide
          Stu Hood added a comment -

          yeah, may be I have missed those scenarios. If you have the fix, pl feel free to update the patch.

          Unfortunately, my fix was more of a workaround: I allow any field that is not the unique key to be added multiple times. But, the local shard always returns all the fields of the document, so if the local shard is queried directly, some fields are duplicated. And so I don't query the local shard directly =/

          Show
          Stu Hood added a comment - yeah, may be I have missed those scenarios. If you have the fix, pl feel free to update the patch. Unfortunately, my fix was more of a workaround: I allow any field that is not the unique key to be added multiple times. But, the local shard always returns all the fields of the document, so if the local shard is queried directly, some fields are duplicated. And so I don't query the local shard directly =/
          Hide
          Sharad Agarwal added a comment -

          >>But: I'm still having the problem where multi-valued fields only get one value returned. During AuxiliaryQPhaseComponent.merge(SolrQueryResponse rsp, SolrQueryResponse auxPhaseRes), you check whether the field already exists before adding it, but multi-value fields can exist multiple times.

          yeah, may be I have missed those scenarios. If you have the fix, pl feel free to update the patch.

          >>Also, I'm considering disabling the AuxiliaryQPhase and just letting the MainQPhase fetch the document fields. All of my documents are small ( < 1k on average with 10ish fields), so I think making another call across the network to fetch the remaining fields is probably a waste for our indexes. What do you think?
          Having AuxiliaryQPhase saves primarily on following counts:-
          1) fetching doc fields
          2) generating snippets
          3) more like this query etc
          -> for only the merged docs.

          From my experience generating snippets is very CPU intensive and if the no of shards are large, there would be lot of CPU wastage (if snippets are generated in MainQPhase) => CPU wastage proportional to (n-1)/n => n being no of shards
          So, having extra network calls saves on CPU. Hence there being trade-off between two.

          Show
          Sharad Agarwal added a comment - >>But: I'm still having the problem where multi-valued fields only get one value returned. During AuxiliaryQPhaseComponent.merge(SolrQueryResponse rsp, SolrQueryResponse auxPhaseRes), you check whether the field already exists before adding it, but multi-value fields can exist multiple times. yeah, may be I have missed those scenarios. If you have the fix, pl feel free to update the patch. >>Also, I'm considering disabling the AuxiliaryQPhase and just letting the MainQPhase fetch the document fields. All of my documents are small ( < 1k on average with 10ish fields), so I think making another call across the network to fetch the remaining fields is probably a waste for our indexes. What do you think? Having AuxiliaryQPhase saves primarily on following counts:- 1) fetching doc fields 2) generating snippets 3) more like this query etc -> for only the merged docs. From my experience generating snippets is very CPU intensive and if the no of shards are large, there would be lot of CPU wastage (if snippets are generated in MainQPhase) => CPU wastage proportional to (n-1)/n => n being no of shards So, having extra network calls saves on CPU. Hence there being trade-off between two.
          Hide
          Stu Hood added a comment -

          I really like where you are headed with the 'componentized' version of the patch: it much more elegant.

          But: I'm still having the problem where multi-valued fields only get one value returned. During AuxiliaryQPhaseComponent.merge(SolrQueryResponse rsp, SolrQueryResponse auxPhaseRes), you check whether the field already exists before adding it, but multi-value fields can exist multiple times.

          Also, I'm considering disabling the AuxiliaryQPhase and just letting the MainQPhase fetch the document fields. All of my documents are small ( < 1k on average with 10ish fields), so I think making another call across the network to fetch the remaining fields is probably a waste for our indexes. What do you think?

          Thanks!

          Show
          Stu Hood added a comment - I really like where you are headed with the 'componentized' version of the patch: it much more elegant. But: I'm still having the problem where multi-valued fields only get one value returned. During AuxiliaryQPhaseComponent.merge(SolrQueryResponse rsp, SolrQueryResponse auxPhaseRes), you check whether the field already exists before adding it, but multi-value fields can exist multiple times. Also, I'm considering disabling the AuxiliaryQPhase and just letting the MainQPhase fetch the document fields. All of my documents are small ( < 1k on average with 10ish fields), so I think making another call across the network to fetch the remaining fields is probably a waste for our indexes. What do you think? Thanks!
          Hide
          Sharad Agarwal added a comment -

          >> Does this mean that this patch requires SOLR-281 to be applied first?
          No. Current patch has all files. When SOLR-281 gets in to the trunk then this patch needs to be reworked.

          Show
          Sharad Agarwal added a comment - >> Does this mean that this patch requires SOLR-281 to be applied first? No. Current patch has all files. When SOLR-281 gets in to the trunk then this patch needs to be reworked.
          Hide
          Stu Hood added a comment -

          ->Based the solution on SOLR-281. Got away with the MultiSearchRequestHandler base class.

          Does this mean that this patch requires SOLR-281 to be applied first? Also, what revision should it be applied to, or will HEAD work?

          -> Doing url encoding for the request params in XMLResponseParser

          Ah yea, I ran into that one a few days ago as well. Additionally, I had XMLResponseParser strip the "WT" parameter off its queries: 'extractterms' was passing through the user's wt, which caused the XML parsing to fail (obviously =) ).

          Can't wait to try it out... Thanks a lot!

          Show
          Stu Hood added a comment - ->Based the solution on SOLR-281 . Got away with the MultiSearchRequestHandler base class. Does this mean that this patch requires SOLR-281 to be applied first? Also, what revision should it be applied to, or will HEAD work? -> Doing url encoding for the request params in XMLResponseParser Ah yea, I ran into that one a few days ago as well. Additionally, I had XMLResponseParser strip the "WT" parameter off its queries: 'extractterms' was passing through the user's wt, which caused the XML parsing to fail (obviously =) ). Can't wait to try it out... Thanks a lot!
          Hide
          Sharad Agarwal added a comment -

          Hi Stu, I have merged the issues fixed by you in my version of patch.

          Also the following changes:

          ->Based the solution on SOLR-281. Got away with the MultiSearchRequestHandler base class. Now federated features are just pure components which can be plugged along with other regular components like QueryComponent, HighlightComponent etc.
          This way it would be very easy to override the core federated functionality.

          ->Renamed the Federated components to :
          GlobalCollectionStatComponent
          MainQPhaseComponent
          AuxiliaryQPhaseComponent

          -> Doing url encoding for the request params in XMLResponseParser

          Show
          Sharad Agarwal added a comment - Hi Stu, I have merged the issues fixed by you in my version of patch. Also the following changes: ->Based the solution on SOLR-281 . Got away with the MultiSearchRequestHandler base class. Now federated features are just pure components which can be plugged along with other regular components like QueryComponent, HighlightComponent etc. This way it would be very easy to override the core federated functionality. ->Renamed the Federated components to : GlobalCollectionStatComponent MainQPhaseComponent AuxiliaryQPhaseComponent -> Doing url encoding for the request params in XMLResponseParser
          Hide
          Stu Hood added a comment -

          Here is another revision of the latest patch (I've still only tried it with r574785: I'm a bit crunched for time).

          Resolved issues:

          • We were forgetting to increment a counter during the last step in SecondQPhaseComponent.process, and so we weren't getting results from all shards.
          • SecondQPhaseComponent.merge was throwing away any fields that already existed in a document, and so it was throwing away parts of multi-value fields. Fixing this exposed the first issue listed below.
          • MultiSearchRequestHandler was creating non-daemon threads (the default) for the thread pool. This meant that when the JVM died, the threads were sticking around. I added a ThreadFactory that creates daemonized threads.

          Open issues:

          • The 'local' shard is ignoring the 'FL' parameter during the FirstQueryPhase, and returning the entire document. We then try and merge the document into itself in SecondQPhaseComponent.merge, causing a ConcurrectMod exception. For now, I put a check for "newDoc != oldDoc", but I think we need to figure out why the local query is returning full documents.
          • Range queries are broken (probably due to the extract terms phase failing)
          • 'start' and 'numfound' are incorrect when returned to the user
            • start is getting wiped out somewhere
            • numfound is counting all copies of matches for a uniqKey towards the total
          • MultiSearchRequestHandler.THREAD_POOL_SIZE and MultiSearchRequestHandler.REQUEST_TIME_OUT_IN_MS should be configuration parameters in solrconfig.xml.

          Thanks a lot!

          Show
          Stu Hood added a comment - Here is another revision of the latest patch (I've still only tried it with r574785: I'm a bit crunched for time). Resolved issues: We were forgetting to increment a counter during the last step in SecondQPhaseComponent.process, and so we weren't getting results from all shards. SecondQPhaseComponent.merge was throwing away any fields that already existed in a document, and so it was throwing away parts of multi-value fields. Fixing this exposed the first issue listed below. MultiSearchRequestHandler was creating non-daemon threads (the default) for the thread pool. This meant that when the JVM died, the threads were sticking around. I added a ThreadFactory that creates daemonized threads. Open issues: The 'local' shard is ignoring the 'FL' parameter during the FirstQueryPhase, and returning the entire document. We then try and merge the document into itself in SecondQPhaseComponent.merge, causing a ConcurrectMod exception. For now, I put a check for "newDoc != oldDoc", but I think we need to figure out why the local query is returning full documents. Range queries are broken (probably due to the extract terms phase failing) 'start' and 'numfound' are incorrect when returned to the user start is getting wiped out somewhere numfound is counting all copies of matches for a uniqKey towards the total MultiSearchRequestHandler.THREAD_POOL_SIZE and MultiSearchRequestHandler.REQUEST_TIME_OUT_IN_MS should be configuration parameters in solrconfig.xml. Thanks a lot!
          Hide
          Stu Hood added a comment -

          I got the rest of the DF issues resolved: please refer to the attached and ignore my earlier comments (some of them were faulty).

          Here is a patch that is very similar to your last patch, but with my fixes included. If you `diff fedsearch.stu.patch fedsearch.patch` you should be able to see what I did.

          The final (minor) issue I've found, is that when I strip the 'start' parameter in SecondQPhaseComponent.createSecondPhaseParams, it gets stripped from the response that is returned to the user as well (although it is honored in the results).

          Thanks again!

          Show
          Stu Hood added a comment - I got the rest of the DF issues resolved: please refer to the attached and ignore my earlier comments (some of them were faulty). Here is a patch that is very similar to your last patch, but with my fixes included. If you `diff fedsearch.stu.patch fedsearch.patch` you should be able to see what I did. The final (minor) issue I've found, is that when I strip the 'start' parameter in SecondQPhaseComponent.createSecondPhaseParams, it gets stripped from the response that is returned to the user as well (although it is honored in the results). Thanks again!
          Hide
          Sharad Agarwal added a comment -

          Thanks much Stu for pointing the issues. Will take care of these in next update.

          Show
          Sharad Agarwal added a comment - Thanks much Stu for pointing the issues. Will take care of these in next update.
          Hide
          Stu Hood added a comment -

          I've been working with the most recent version of the patch some more, and have run into some more issues. Since I'm sure that you have been working on the patch on your own, I don't want you to have to dig through my changes as a diff. Instead I'll just try and point them out for your revision.

          We have a few fields that are indexed as strings that contain characters like '@' and ':'. There are still a few places having to do with the 'df' parameter where these need to be escaped/worked around, but here is what I've found so far:

          • During the iteration over the document's uniqFields in SecondQPhaseComponent.createSecondPhaseParams
            • Surrounded the value in "quotes"
          • During the iteration over strTerms in MultiSearchRequestHandler.buildQuery
            • Modified the split on '@' to only split on the last '@' in the string.
            • Modified the split on ':' to split into a maximum of 2 pieces.
          • During the iteration over extractedTerms in GlobalCollectionStatComponent.calcuateGlobalCollectionStat
            • Modified the split on ':' to split into a maximum of 2 pieces.

          I also ran into some problems in other areas:

          • XMLResponseParser.parse(url, params) fails to parse a response if it is indented using the 'indent=on' parameter, which gets passed through to the subqueries
            • Stripped out 'indent' during the iteration over the params (but there is probably a better solution to this issue)
          • SecondQPhaseComponent.createSecondPhaseParams passes the 'start' parameter through to the subqueries, which leads to a null pointer when we are querying for specific unique ids.
            • Stripped out 'start' during the iteration over the params

          I'll keep looking for the last few 'df' issues. Thanks a lot for the patch!

          Show
          Stu Hood added a comment - I've been working with the most recent version of the patch some more, and have run into some more issues. Since I'm sure that you have been working on the patch on your own, I don't want you to have to dig through my changes as a diff. Instead I'll just try and point them out for your revision. We have a few fields that are indexed as strings that contain characters like '@' and ':'. There are still a few places having to do with the 'df' parameter where these need to be escaped/worked around, but here is what I've found so far: During the iteration over the document's uniqFields in SecondQPhaseComponent.createSecondPhaseParams Surrounded the value in "quotes" During the iteration over strTerms in MultiSearchRequestHandler.buildQuery Modified the split on '@' to only split on the last '@' in the string. Modified the split on ':' to split into a maximum of 2 pieces. During the iteration over extractedTerms in GlobalCollectionStatComponent.calcuateGlobalCollectionStat Modified the split on ':' to split into a maximum of 2 pieces. I also ran into some problems in other areas: XMLResponseParser.parse(url, params) fails to parse a response if it is indented using the 'indent=on' parameter, which gets passed through to the subqueries Stripped out 'indent' during the iteration over the params (but there is probably a better solution to this issue) SecondQPhaseComponent.createSecondPhaseParams passes the 'start' parameter through to the subqueries, which leads to a null pointer when we are querying for specific unique ids. Stripped out 'start' during the iteration over the params I'll keep looking for the last few 'df' issues. Thanks a lot for the patch!
          Hide
          Hoss Man added a comment -

          FWIW: I haven't really been able to follow this issue much (it's way out of my area of expertise) but seeing some comments go by in email i wanted to mention two things...

          > ResonseDocs are based on document unique key while DocList is based on internal doc id.
          > The purpose of ResponseDocs is to represent documents lying in remote index while DocList are
          > meant for local internal doc id.

          One thing to keep in mind is the way MultiReader deals with this in Lucene ... if you know the maxDoc of each of your sub-indexes, then you can compute internal docIds ... that may be one way to preserve the DocList abstraction (and allow for supporting schemas without uniqueKey fields) when dealing with federated search (allthough it may open up new problems if you need to rely on havingsome form of an identifier that doesn't change .. i'm not sure if the approach being taken makes multiple requests to the shards)

          That said...

          Federated Search is a complex enough concept that if it requires additions to the ResponseWriter API to be done effeciently, I don't think that would be the end of the world – the key thing would be to find ways to minimize the impact on existing clients – if things work for you now, they should keep working for you; if you want to start using federated search, then it's fair to expect that you may have to change a few things, or deal with a few limitations. Off hte top of my head: one option may be to add a FederatableResponseWriter subclass such that if a request is federated, then the writer being used must implement that interface or it's a runtime error.

          Show
          Hoss Man added a comment - FWIW: I haven't really been able to follow this issue much (it's way out of my area of expertise) but seeing some comments go by in email i wanted to mention two things... > ResonseDocs are based on document unique key while DocList is based on internal doc id. > The purpose of ResponseDocs is to represent documents lying in remote index while DocList are > meant for local internal doc id. One thing to keep in mind is the way MultiReader deals with this in Lucene ... if you know the maxDoc of each of your sub-indexes, then you can compute internal docIds ... that may be one way to preserve the DocList abstraction (and allow for supporting schemas without uniqueKey fields) when dealing with federated search (allthough it may open up new problems if you need to rely on havingsome form of an identifier that doesn't change .. i'm not sure if the approach being taken makes multiple requests to the shards) That said... Federated Search is a complex enough concept that if it requires additions to the ResponseWriter API to be done effeciently, I don't think that would be the end of the world – the key thing would be to find ways to minimize the impact on existing clients – if things work for you now, they should keep working for you; if you want to start using federated search, then it's fair to expect that you may have to change a few things, or deal with a few limitations. Off hte top of my head: one option may be to add a FederatableResponseWriter subclass such that if a request is federated, then the writer being used must implement that interface or it's a runtime error.
          Hide
          Stu Hood added a comment -

          Yea, that is a bit of a problem isn't it...

          It looks like if you subclassed SolrIndexSearcher and DocList, you could generate fake Lucene document ids that map back to actual unique keys. Unfortunately, SolrIndexSearcher is intimidatingly long, so depending on how people feel about adding to the writers, it might not be necessary to modify it.

          Show
          Stu Hood added a comment - Yea, that is a bit of a problem isn't it... It looks like if you subclassed SolrIndexSearcher and DocList, you could generate fake Lucene document ids that map back to actual unique keys. Unfortunately, SolrIndexSearcher is intimidatingly long, so depending on how people feel about adding to the writers, it might not be necessary to modify it.
          Hide
          Sharad Agarwal added a comment -

          >Is there any way ResponseDocs could extend Doclist so that all of the writers don't need to be modified?
          ResonseDocs are based on document unique key while DocList is based on internal doc id.
          The purpose of ResponseDocs is to represent documents lying in remote index while DocList are meant for local internal doc id.

          I dont think there is an easy way to avoid modifying writers. Currently writers retrieve document data based on local internal doc id. But for remote index, this has to be done differently.

          Show
          Sharad Agarwal added a comment - >Is there any way ResponseDocs could extend Doclist so that all of the writers don't need to be modified? ResonseDocs are based on document unique key while DocList is based on internal doc id. The purpose of ResponseDocs is to represent documents lying in remote index while DocList are meant for local internal doc id. I dont think there is an easy way to avoid modifying writers. Currently writers retrieve document data based on local internal doc id. But for remote index, this has to be done differently.
          Hide
          Stu Hood added a comment -

          I was trying to use the PHP serialized response writer with the federate search patch, and ran into some trouble. Then I noticed that you had made some changes in XMLWriter to support the federated.ResponseDocs class.

          Is there any way ResponseDocs could extend Doclist so that all of the writers don't need to be modified?

          Show
          Stu Hood added a comment - I was trying to use the PHP serialized response writer with the federate search patch, and ran into some trouble. Then I noticed that you had made some changes in XMLWriter to support the federated.ResponseDocs class. Is there any way ResponseDocs could extend Doclist so that all of the writers don't need to be modified?
          Hide
          Stu Hood added a comment -

          I'm also seeing the following issue, but I haven't have time to investigate:

          WARNING: Exception while querying shard crc10:8080/solr_postfix09092000-09112000 :java.lang.ClassCastException: com.sun.org.apache.xerces.internal.dom.DeferredTextImpl cannot be cast to org.w3c.dom.Element

          Show
          Stu Hood added a comment - I'm also seeing the following issue, but I haven't have time to investigate: WARNING: Exception while querying shard crc10:8080/solr_postfix09092000-09112000 :java.lang.ClassCastException: com.sun.org.apache.xerces.internal.dom.DeferredTextImpl cannot be cast to org.w3c.dom.Element
          Hide
          Stu Hood added a comment -

          For the second issue above, I did the following:

          *Added 'static String escape(string, field, schema)' to QueryParsing, that uses SolrQueryParser's escape method. I run this across all key values as they are being iterated in the beginning of 'SecondQPhaseComponent.createSecondPhaseParams'

          Show
          Stu Hood added a comment - For the second issue above, I did the following: *Added 'static String escape(string, field, schema)' to QueryParsing, that uses SolrQueryParser's escape method. I run this across all key values as they are being iterated in the beginning of 'SecondQPhaseComponent.createSecondPhaseParams'
          Hide
          Stu Hood added a comment -

          Thanks Sharad, the last patch applied cleanly as you said.

          I've run into some errors that should be quick fixes for your next revision:

          • I had to modify the code not to assume that shard names end in '/solr' so that I could specify an instance name, like: 'blah.com:8080/instance_name'.
          • The parameters for your subqueries are not (always?) getting escaped. My document ids contain some colons (':'), and so its throwing a null pointer error during the SecondQueryphase, and then again in SolrCore execute.

          Thanks a lot for your work!

          Show
          Stu Hood added a comment - Thanks Sharad, the last patch applied cleanly as you said. I've run into some errors that should be quick fixes for your next revision: I had to modify the code not to assume that shard names end in '/solr' so that I could specify an instance name, like: 'blah.com:8080/instance_name'. The parameters for your subqueries are not (always?) getting escaped. My document ids contain some colons (':'), and so its throwing a null pointer error during the SecondQueryphase, and then again in SolrCore execute. Thanks a lot for your work!
          Hide
          Sharad Agarwal added a comment -

          Updated to do following:
          1. Fed search query being executed via different components
          -GlobalCollectionStatComponent (optional)
          -FirstQPhaseComponent
          -SecondQPhaseComponent (optional)
          The user can use 'skip' request param to tell which component to skip

          2. Sub searcher requests are executed in parallel threads using thread pool.

          3. work against the trunk revision 574785.

          I am working on further refactoring the code and make it work with SOLR-281, which should make the code really clean with pluggable components.

          Show
          Sharad Agarwal added a comment - Updated to do following: 1. Fed search query being executed via different components -GlobalCollectionStatComponent (optional) -FirstQPhaseComponent -SecondQPhaseComponent (optional) The user can use 'skip' request param to tell which component to skip 2. Sub searcher requests are executed in parallel threads using thread pool. 3. work against the trunk revision 574785. I am working on further refactoring the code and make it work with SOLR-281 , which should make the code really clean with pluggable components.
          Hide
          Stu Hood added a comment -

          Sharad, what Solr revision have you applied the latest copy of this this patch against? I know that the r573893 commit caused all kinds of havoc in the source tree, but I'd really like to try it out, and I don't mind using an older revision to get it working.

          Also, do you have any newer versions of the patch?

          Thanks a lot!

          Show
          Stu Hood added a comment - Sharad, what Solr revision have you applied the latest copy of this this patch against? I know that the r573893 commit caused all kinds of havoc in the source tree, but I'd really like to try it out, and I don't mind using an older revision to get it working. Also, do you have any newer versions of the patch? Thanks a lot!
          Hide
          Sharad Agarwal added a comment -

          Recently I have added a feature of parallel requests to shards using a thread pool. (not yet uploaded the patch)
          Async IO would be the next thing but dont want to bring in its complexity so early.
          Perhaps, we can benchmark the performance of the thread pool/parallel requests implementation. Later based on the numbers, work towards having Async IO.

          Show
          Sharad Agarwal added a comment - Recently I have added a feature of parallel requests to shards using a thread pool. (not yet uploaded the patch) Async IO would be the next thing but dont want to bring in its complexity so early. Perhaps, we can benchmark the performance of the thread pool/parallel requests implementation. Later based on the numbers, work towards having Async IO.
          Hide
          Mike Klaas added a comment -

          Great stuff!

          I think asynchronous/parallel requests are a central feature to this kind of result aggregator. In my similar python implementation, I fire off all the requests and collect the responses in a select() loop. Threads are possible but get somewhat weighty when you have many shards (I've used up to 90). An easier alternative to select() is to simply fire off all the requests and then wait for the responses sequentially (assuming java has an api that allows this). This is almost as good as the select() loop but does not have the same complexity.

          Show
          Mike Klaas added a comment - Great stuff! I think asynchronous/parallel requests are a central feature to this kind of result aggregator. In my similar python implementation, I fire off all the requests and collect the responses in a select() loop. Threads are possible but get somewhat weighty when you have many shards (I've used up to 90). An easier alternative to select() is to simply fire off all the requests and then wait for the responses sequentially (assuming java has an api that allows this). This is almost as good as the select() loop but does not have the same complexity.
          Hide
          Sharad Agarwal added a comment -

          Added support for sorting.

          Show
          Sharad Agarwal added a comment - Added support for sorting.
          Hide
          Sharad Agarwal added a comment -

          Making this issue blocked by SOLR-281

          Show
          Sharad Agarwal added a comment - Making this issue blocked by SOLR-281
          Hide
          Sharad Agarwal added a comment - - edited

          > "Index view consistency between multiple requests" requirement is relaxed in this implementation.
          >>Do you have plans to remedy that? Or do you think that most people are OK with inconsistencies that could arise?
          The thing to note here is that currently multi phase execution is based on document unique fields, NOT on doc internal ids. So there wont be much inconsistencies between requests; as it does not depend on changing internal doc ids.
          The possibility is that a particular document may have been deleted when the second phase executes.; which in my opinion should be OK to live with.
          Other possibility could be the document is changed and original query terms are not present in the document anymore. This can be solved by doing a AND with the original query and uniq field document query.

          If people think it is really crucial to have index view consistency, then it should be easy to implement "Consistency via Retry" as mentioned in http://wiki.apache.org/solr/FederatedSearch

          >>It might also be the case that a custom partitioning function could be implemented (such as improving caching by partitioning queries, etc) or it may >>be more efficient to do the second phase of a query on the same shard copy as the first phase.
          >>In that case it might make sense load balancing across shards from Solr.
          For second phase of a query to execute on the same shard copy, third party "Sticky load balancers" can be used. I believe Apache already does that. All copies of a single partition can sit behind the Apache load balancer (doing the "Sticky"). The merger just needs to know about the Load-balancer ip/port for each partition. Now based on the query, merger can search the appropriate partitions only.

          To improve the caching, Solr itself has to do the load balancing. Other option could be to introduce the query result cache at the merger itself.

          >>Where are terms extracted from (some queries require index access)? This should be delegated to the shards, no?It can be the same step that gets >>the docFreqs from the shards (pass the query, not the terms).
          yes, if thats the case, should be easy to implement as you have suggested.

          >>I think we should base the solution on something like https://issues.apache.org/jira/browse/SOLR-281
          cool, I was looking for something like this. This looks like the way to go.

          >>Any thoughts on RMI vs HTTP for the searcher-subsearcher interface?
          RMI could be supported as an option by enhancing the ResponseParser (better name ??) interface. The remote search server can directly return the SolrQueryResponse object. I understand that there will be some performance benefit if doing the native java marshalling/unmarshalling of object; instead of Solr response writing and then parsing (if done the HTTP way). The question we need to answer is: Is the effort/complexity worth it?

          In our organization we made a conscious decision to go for HTTP. The operation folks like HTTP as it is standard stuff, load balancing, monitoring etc. Lot of tools already available for it. With RMI, I am not sure external Sticky load-balancing is possible; the merger itself has to build the logic.
          Moreover, I think HTTP fits more naturally with Solr in its Request handler model.

          Show
          Sharad Agarwal added a comment - - edited > "Index view consistency between multiple requests" requirement is relaxed in this implementation. >>Do you have plans to remedy that? Or do you think that most people are OK with inconsistencies that could arise? The thing to note here is that currently multi phase execution is based on document unique fields, NOT on doc internal ids. So there wont be much inconsistencies between requests; as it does not depend on changing internal doc ids. The possibility is that a particular document may have been deleted when the second phase executes.; which in my opinion should be OK to live with. Other possibility could be the document is changed and original query terms are not present in the document anymore. This can be solved by doing a AND with the original query and uniq field document query. If people think it is really crucial to have index view consistency, then it should be easy to implement "Consistency via Retry" as mentioned in http://wiki.apache.org/solr/FederatedSearch >>It might also be the case that a custom partitioning function could be implemented (such as improving caching by partitioning queries, etc) or it may >>be more efficient to do the second phase of a query on the same shard copy as the first phase. >>In that case it might make sense load balancing across shards from Solr. For second phase of a query to execute on the same shard copy, third party "Sticky load balancers" can be used. I believe Apache already does that. All copies of a single partition can sit behind the Apache load balancer (doing the "Sticky"). The merger just needs to know about the Load-balancer ip/port for each partition. Now based on the query, merger can search the appropriate partitions only. To improve the caching, Solr itself has to do the load balancing. Other option could be to introduce the query result cache at the merger itself. >>Where are terms extracted from (some queries require index access)? This should be delegated to the shards, no?It can be the same step that gets >>the docFreqs from the shards (pass the query, not the terms). yes, if thats the case, should be easy to implement as you have suggested. >>I think we should base the solution on something like https://issues.apache.org/jira/browse/SOLR-281 cool, I was looking for something like this. This looks like the way to go. >>Any thoughts on RMI vs HTTP for the searcher-subsearcher interface? RMI could be supported as an option by enhancing the ResponseParser (better name ??) interface. The remote search server can directly return the SolrQueryResponse object. I understand that there will be some performance benefit if doing the native java marshalling/unmarshalling of object; instead of Solr response writing and then parsing (if done the HTTP way). The question we need to answer is: Is the effort/complexity worth it? In our organization we made a conscious decision to go for HTTP. The operation folks like HTTP as it is standard stuff, load balancing, monitoring etc. Lot of tools already available for it. With RMI, I am not sure external Sticky load-balancing is possible; the merger itself has to build the logic. Moreover, I think HTTP fits more naturally with Solr in its Request handler model.
          Hide
          Yonik Seeley added a comment -

          Thanks for kicking this off Sharad!

          > "Index view consistency between multiple requests" requirement is relaxed in this implementation.

          Do you have plans to remedy that? Or do you think that most people are OK with inconsistencies that could arise?

          > Load-balancing and Fail-over taken care by VIP as usual

          In a static configuration, this works OK, but it might be nice to support a more dynamic environment where extra shards could be easily added. It might also be the case that a custom partitioning function could be implemented (such as improving caching by partitioning queries, etc) or it may be more efficient to do the second phase of a query on the same shard copy as the first phase.
          In that case it might make sense load balancing across shards from Solr . The VIP solution would map to the simplest case of a single copy of each shard, thus a LB could still be used if desired.

          > STEP 1: The query is built, terms are extracted.

          Where are terms extracted from (some queries require index access)? This should be delegated to the shards, no? It can be the same step that gets the docFreqs from the shards (pass the query, not the terms). Step 1 should also be optional for those that can make do with local idf factors.

          In order to facilitate custom logic in a distributed environment,
          I think we should base the solution on something like
          https://issues.apache.org/jira/browse/SOLR-281
          With additional hooks for distributed search.
          This should allow relatively independent parts of query processing to piggyback in the same network request (for example, the first steps to querying and faceting can be added to a single request, and highlighting and stored field retrieval can be done in conjunction).

          Any thoughts on RMI vs HTTP for the searcher-subsearcher interface?

          Show
          Yonik Seeley added a comment - Thanks for kicking this off Sharad! > "Index view consistency between multiple requests" requirement is relaxed in this implementation. Do you have plans to remedy that? Or do you think that most people are OK with inconsistencies that could arise? > Load-balancing and Fail-over taken care by VIP as usual In a static configuration, this works OK, but it might be nice to support a more dynamic environment where extra shards could be easily added. It might also be the case that a custom partitioning function could be implemented (such as improving caching by partitioning queries, etc) or it may be more efficient to do the second phase of a query on the same shard copy as the first phase. In that case it might make sense load balancing across shards from Solr . The VIP solution would map to the simplest case of a single copy of each shard, thus a LB could still be used if desired. > STEP 1: The query is built, terms are extracted. Where are terms extracted from (some queries require index access)? This should be delegated to the shards, no? It can be the same step that gets the docFreqs from the shards (pass the query, not the terms). Step 1 should also be optional for those that can make do with local idf factors. In order to facilitate custom logic in a distributed environment, I think we should base the solution on something like https://issues.apache.org/jira/browse/SOLR-281 With additional hooks for distributed search. This should allow relatively independent parts of query processing to piggyback in the same network request (for example, the first steps to querying and faceting can be added to a single request, and highlighting and stored field retrieval can be done in conjunction). Any thoughts on RMI vs HTTP for the searcher-subsearcher interface?
          Hide
          Sharad Agarwal added a comment -

          To do a quick test of the patch, try adding:
          shards=local,localhost:8080
          as a request parameter to the search url

          Show
          Sharad Agarwal added a comment - To do a quick test of the patch, try adding: shards=local,localhost:8080 as a request parameter to the search url

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Sharad Agarwal
            • Votes:
              14 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development