Solr
  1. Solr
  2. SOLR-4816

Add document routing to CloudSolrServer

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 5.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      This issue adds the following enhancements to CloudSolrServer's update logic:

      1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server.

      2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster.

      These enhancements should allow for near linear scalability on indexing throughput.

      Usage:

      CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
      cloudClient.setParallelUpdates(true);
      SolrInputDocument doc1 = new SolrInputDocument();
      doc1.addField(id, "0");
      doc1.addField("a_t", "hello1");
      SolrInputDocument doc2 = new SolrInputDocument();
      doc2.addField(id, "2");
      doc2.addField("a_t", "hello2");

      UpdateRequest request = new UpdateRequest();
      request.add(doc1);
      request.add(doc2);
      request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);

      NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response.

      //To get more detailed response down cast to RouteResponse:
      CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;

      1. SOLR-4816-sriesenberg.patch
        5 kB
        Stephen Riesenberg
      2. SOLR-4816.patch
        4 kB
        Joel Bernstein
      3. SOLR-4816.patch
        4 kB
        Joel Bernstein
      4. SOLR-4816.patch
        5 kB
        Joel Bernstein
      5. SOLR-4816.patch
        5 kB
        Joel Bernstein
      6. SOLR-4816.patch
        5 kB
        Joel Bernstein
      7. SOLR-4816.patch
        6 kB
        Joel Bernstein
      8. SOLR-4816.patch
        11 kB
        Joel Bernstein
      9. SOLR-4816.patch
        11 kB
        Joel Bernstein
      10. SOLR-4816.patch
        14 kB
        Joel Bernstein
      11. SOLR-4816.patch
        14 kB
        Joel Bernstein
      12. SOLR-4816.patch
        14 kB
        Joel Bernstein
      13. SOLR-4816.patch
        17 kB
        Joel Bernstein
      14. SOLR-4816.patch
        18 kB
        Joel Bernstein
      15. SOLR-4816.patch
        20 kB
        Joel Bernstein
      16. SOLR-4816.patch
        21 kB
        Joel Bernstein
      17. SOLR-4816.patch
        22 kB
        Joel Bernstein
      18. SOLR-4816.patch
        22 kB
        Joel Bernstein
      19. SOLR-4816.patch
        29 kB
        Joel Bernstein
      20. SOLR-4816.patch
        27 kB
        Erick Erickson
      21. SOLR-4816.patch
        23 kB
        Joel Bernstein
      22. SOLR-4816.patch
        21 kB
        Joel Bernstein
      23. SOLR-4816.patch
        22 kB
        Joel Bernstein
      24. SOLR-4816.patch
        25 kB
        Joel Bernstein
      25. SOLR-4816.patch
        27 kB
        Joel Bernstein
      26. SOLR-4816.patch
        27 kB
        Joel Bernstein
      27. SOLR-4816.patch
        31 kB
        Joel Bernstein
      28. SOLR-4816.patch
        65 kB
        Mark Miller
      29. SOLR-4816.patch
        65 kB
        Mark Miller
      30. RequestTask-removal.patch
        5 kB
        Shikhar Bhushan

        Issue Links

          Activity

          Hide
          Joel Bernstein added a comment -

          Initial patch with the new "directUpdate" method. This is the initial implementation and has not been tested. Testing and implementation iterations to follow. Patch was generated with Solr 4.3

          Show
          Joel Bernstein added a comment - Initial patch with the new "directUpdate" method. This is the initial implementation and has not been tested. Testing and implementation iterations to follow. Patch was generated with Solr 4.3
          Hide
          Mark Miller added a comment -

          This looks like a dupe of SOLR-3154 - see that issue for more history.

          Show
          Mark Miller added a comment - This looks like a dupe of SOLR-3154 - see that issue for more history.
          Hide
          Mark Miller added a comment -

          This shouldn't really be an extra method - it should just be the default way the CloudSolrServer works.

          Show
          Mark Miller added a comment - This shouldn't really be an extra method - it should just be the default way the CloudSolrServer works.
          Hide
          Joel Bernstein added a comment -

          OK, reviewing SOLR-3154. Thanks!

          Show
          Joel Bernstein added a comment - OK, reviewing SOLR-3154 . Thanks!
          Hide
          Joel Bernstein added a comment -

          In looking at SOLR-3154 it looks this was done pre-document routing so it will have to be changed. It also looks like it won't work with batches because it checks the first document id only.

          The batches issue is why I created another method. We could get batches into the original method but the code would be pretty hairy. I like the idea of a nice clean separate method for this.

          Let me know how you'd like to proceed. I can work on updating the SOLR-3154 implementation or keep working this implementation.

          Thanks

          Show
          Joel Bernstein added a comment - In looking at SOLR-3154 it looks this was done pre-document routing so it will have to be changed. It also looks like it won't work with batches because it checks the first document id only. The batches issue is why I created another method. We could get batches into the original method but the code would be pretty hairy. I like the idea of a nice clean separate method for this. Let me know how you'd like to proceed. I can work on updating the SOLR-3154 implementation or keep working this implementation. Thanks
          Hide
          Mark Miller added a comment - - edited

          It also looks like it won't work with batches because it checks the first document id only.

          Yes, see the comments in the issue:

          "The patch is limited, but a start: right now it's just for String and Integer Id's and it only acts upon the first document or deleteby id (favoring document) - if you use the bulk methods they are all sent along to the leader of the first id."

          I like the idea of a nice clean separate method for this.

          I still think it needs to be the default method, and we simply should support batching. The other limitation around types is no longer a concern now that Yonik changed the hashing.

          I don't know that we need to build on the patch in SOLR-3154, it was a very quick experiment, but I think that is the right implementation direction. The smart client should simply work optimally without having to use alternate methods IMO.

          Show
          Mark Miller added a comment - - edited It also looks like it won't work with batches because it checks the first document id only. Yes, see the comments in the issue: "The patch is limited, but a start: right now it's just for String and Integer Id's and it only acts upon the first document or deleteby id (favoring document) - if you use the bulk methods they are all sent along to the leader of the first id." I like the idea of a nice clean separate method for this. I still think it needs to be the default method, and we simply should support batching. The other limitation around types is no longer a concern now that Yonik changed the hashing. I don't know that we need to build on the patch in SOLR-3154 , it was a very quick experiment, but I think that is the right implementation direction. The smart client should simply work optimally without having to use alternate methods IMO.
          Hide
          Joel Bernstein added a comment -

          Ok, I'll integrate the directUpdate logic into the main request flow.

          Show
          Joel Bernstein added a comment - Ok, I'll integrate the directUpdate logic into the main request flow.
          Hide
          Joel Bernstein added a comment - - edited

          First working version of the patch. The directUpdate method is still public so it can be called directly to specify a different id field. The directUpdate method is also integrated into the main request method where it will use the field named "id" as the id field.

          Show
          Joel Bernstein added a comment - - edited First working version of the patch. The directUpdate method is still public so it can be called directly to specify a different id field. The directUpdate method is also integrated into the main request method where it will use the field named "id" as the id field.
          Hide
          Mark Miller added a comment -

          where it will use the field named "id" as the id field.

          It seems there are two easish improvements we might do - either allow setting the 'default' id field with a setter or look into using the new schema rest api to actually request the id field from Solr.

          I'm still not sold on the need to expose a new update method.

          That's just a quick response though, I'll review your patch as soon as I get a chance.

          Show
          Mark Miller added a comment - where it will use the field named "id" as the id field. It seems there are two easish improvements we might do - either allow setting the 'default' id field with a setter or look into using the new schema rest api to actually request the id field from Solr. I'm still not sold on the need to expose a new update method. That's just a quick response though, I'll review your patch as soon as I get a chance.
          Hide
          Joel Bernstein added a comment -

          Ok, I like the idea of the default id setter. I'll add this and make the directUpdate method private.

          Show
          Joel Bernstein added a comment - Ok, I like the idea of the default id setter. I'll add this and make the directUpdate method private.
          Hide
          Stephen Riesenberg added a comment -

          I tried out the patch and it had some bugs in it. Did you want me to post changes we made to it? I won't test it again until tomorrow, so we'll have to let you know whether this works.

          Show
          Stephen Riesenberg added a comment - I tried out the patch and it had some bugs in it. Did you want me to post changes we made to it? I won't test it again until tomorrow, so we'll have to let you know whether this works.
          Hide
          Hoss Man added a comment -

          Side note: for testing purposes it's going to be handy to have a property/setter on CloudSolrServer that forces it to ignore the id and randomly pick a node to send the updates to, and then have the test framework randomly set that property – that way we can future proof ourselves against accidentally server side bugs where docs aren't sent to the correct shard, or docs aren't sent to a leader, etc...

          Show
          Hoss Man added a comment - Side note: for testing purposes it's going to be handy to have a property/setter on CloudSolrServer that forces it to ignore the id and randomly pick a node to send the updates to, and then have the test framework randomly set that property – that way we can future proof ourselves against accidentally server side bugs where docs aren't sent to the correct shard, or docs aren't sent to a leader, etc...
          Hide
          Joel Bernstein added a comment -

          Hoss, sounds good I'll work this into the next patch.

          Stephen, sure go ahead and post the changes. I'm going to make some changes to the patch shortly and I'll work with your changes as well.

          Show
          Joel Bernstein added a comment - Hoss, sounds good I'll work this into the next patch. Stephen, sure go ahead and post the changes. I'm going to make some changes to the patch shortly and I'll work with your changes as well.
          Hide
          Joel Bernstein added a comment - - edited

          No longer going to the core url directly with updates and instead sending to the baseUrl+/+collection. Also passing through the orignal params to each request. Added the getter/setter for the defaultId.

          Stephen, let me know if this addresses the bugs you found.

          Show
          Joel Bernstein added a comment - - edited No longer going to the core url directly with updates and instead sending to the baseUrl+/+collection. Also passing through the orignal params to each request. Added the getter/setter for the defaultId. Stephen, let me know if this addresses the bugs you found.
          Hide
          Stephen Riesenberg added a comment -

          First attempt. This patch is to fix issues with directUpdate() method and add defaultIdField. Next attempt would be to integrate it better into the request() method. This is still un-tested, as I'm going to be deploying it today for load testing.

          Show
          Stephen Riesenberg added a comment - First attempt. This patch is to fix issues with directUpdate() method and add defaultIdField. Next attempt would be to integrate it better into the request() method. This is still un-tested, as I'm going to be deploying it today for load testing.
          Hide
          Stephen Riesenberg added a comment - - edited

          Oops, I missed your latest. I'll check it out.

          Edit: Main issue is an NPE using params (params = new ...), and missing use of the defaultCollection in params.get().

          Show
          Stephen Riesenberg added a comment - - edited Oops, I missed your latest. I'll check it out. Edit : Main issue is an NPE using params (params = new ...), and missing use of the defaultCollection in params.get().
          Hide
          Joel Bernstein added a comment -

          Stephen, I see what you did with the defaultCollection. I'll work that into the patch. I made a few other changes so I'll use the patch I'm working with.

          Show
          Joel Bernstein added a comment - Stephen, I see what you did with the defaultCollection. I'll work that into the patch. I made a few other changes so I'll use the patch I'm working with.
          Hide
          Joel Bernstein added a comment - - edited

          Added Stephen's code fixing NPE when params and/or collection is not set.

          Show
          Joel Bernstein added a comment - - edited Added Stephen's code fixing NPE when params and/or collection is not set.
          Hide
          Stephen Riesenberg added a comment -

          Fixed several other NPEs before getting to the code where it uses the DocRouter. It looks like the router that is used is the ImplicitDocRouter not the HashBasedRouter. So it returns a null Slice. Any idea how to get the correct DocRouter for hash-based from the API? Or could we just create a new HashBasedRouter instance and use it.

          Show
          Stephen Riesenberg added a comment - Fixed several other NPEs before getting to the code where it uses the DocRouter. It looks like the router that is used is the ImplicitDocRouter not the HashBasedRouter. So it returns a null Slice. Any idea how to get the correct DocRouter for hash-based from the API? Or could we just create a new HashBasedRouter instance and use it.
          Hide
          Joel Bernstein added a comment -

          Stephen, let's discuss offline. Just sent you an email.

          Show
          Joel Bernstein added a comment - Stephen, let's discuss offline. Just sent you an email.
          Hide
          Mark Miller added a comment -

          Seems we should be batching bulk docs up based on what leader they will go to and propegate the bulk adds where we can.

          Show
          Mark Miller added a comment - Seems we should be batching bulk docs up based on what leader they will go to and propegate the bulk adds where we can.
          Hide
          Mark Miller added a comment -

          Ah, glanced too quickly - that is what is happening - looks a lot cleaner than your initial comment indicated, nice.

          Show
          Mark Miller added a comment - Ah, glanced too quickly - that is what is happening - looks a lot cleaner than your initial comment indicated, nice.
          Hide
          Mark Miller added a comment -

          Fixed several other NPEs before getting to the code where it uses the DocRouter.

          We should make sure we have tests that would have caught these as well!

          Show
          Mark Miller added a comment - Fixed several other NPEs before getting to the code where it uses the DocRouter. We should make sure we have tests that would have caught these as well!
          Hide
          Yonik Seeley added a comment -

          Any idea how to get the correct DocRouter for hash-based from the API?

          It's collection specific.
          See DocCollection.getRouter()

          Show
          Yonik Seeley added a comment - Any idea how to get the correct DocRouter for hash-based from the API? It's collection specific. See DocCollection.getRouter()
          Hide
          Stephen Riesenberg added a comment - - edited

          It's collection specific.
          See DocCollection.getRouter()

          No javadoc on that method. In my environment, collections were set up for implicit routing because numShards was not specified at create time. Joel straightened us out. Only thing I found was at http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin which doesn't talk about document routing.

          The NPEs I mentioned before have been fixed, one other fix was made to set the commitWithinMS and pass params to the sub-requests and document router. Now I am running a load test against this to see how it performs. I'll post an updated patch with my changes a bit later.

          We should make sure we have tests that would have caught these as well!

          I didn't run the test suite. Seems like a good idea! I'll get caught up eventually. Having run it now with the earlier patches, seeing the NPEs. Good stuff. Thanks.

          Show
          Stephen Riesenberg added a comment - - edited It's collection specific. See DocCollection.getRouter() No javadoc on that method. In my environment, collections were set up for implicit routing because numShards was not specified at create time. Joel straightened us out. Only thing I found was at http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin which doesn't talk about document routing. The NPEs I mentioned before have been fixed, one other fix was made to set the commitWithinMS and pass params to the sub-requests and document router. Now I am running a load test against this to see how it performs. I'll post an updated patch with my changes a bit later. We should make sure we have tests that would have caught these as well! I didn't run the test suite. Seems like a good idea! I'll get caught up eventually. Having run it now with the earlier patches, seeing the NPEs. Good stuff. Thanks.
          Hide
          Joel Bernstein added a comment -

          To get the existing tests to run I needed to limit the directUpdates to only UpdateRequests that have document lists.

          All other requests bypass the directUpdates method and follow the main flow.

          Currently UpdateRequestExt also bypasses the directUpdates method and follows the main flow. There is a TODO in the UpdateRequestExt class to bake it into the UpdateRequest class. This would be a cleaner approach to getting UpdateRequestExt to work with directUpdates.

          Show
          Joel Bernstein added a comment - To get the existing tests to run I needed to limit the directUpdates to only UpdateRequests that have document lists. All other requests bypass the directUpdates method and follow the main flow. Currently UpdateRequestExt also bypasses the directUpdates method and follows the main flow. There is a TODO in the UpdateRequestExt class to bake it into the UpdateRequest class. This would be a cleaner approach to getting UpdateRequestExt to work with directUpdates.
          Hide
          Joel Bernstein added a comment -

          Existing tests now run with latest patch. Still need to add tests for the new directUpdate functionality.

          Show
          Joel Bernstein added a comment - Existing tests now run with latest patch. Still need to add tests for the new directUpdate functionality.
          Hide
          Joel Bernstein added a comment -

          Reorganized conditional tests that lead to direct updates. All conditions must pass or the request will follow main flow. Added condition that docRouter must be a CompositeIdRouter for directUpdates.

          Show
          Joel Bernstein added a comment - Reorganized conditional tests that lead to direct updates. All conditions must pass or the request will follow main flow. Added condition that docRouter must be a CompositeIdRouter for directUpdates.
          Hide
          Joel Bernstein added a comment -

          Stephen can you post your latest patch I'd like to see how you handle the commitWithinMS.

          Show
          Joel Bernstein added a comment - Stephen can you post your latest patch I'd like to see how you handle the commitWithinMS.
          Hide
          Joel Bernstein added a comment -

          To support UpdateRequestExt in a clean way I added a new interface called Routable which is implemented by both UpdateRequest and UpdateRequestExt.

          Initial patch for this design meant for review only, has not been tested.

          Show
          Joel Bernstein added a comment - To support UpdateRequestExt in a clean way I added a new interface called Routable which is implemented by both UpdateRequest and UpdateRequestExt. Initial patch for this design meant for review only, has not been tested.
          Hide
          Joel Bernstein added a comment -

          Runnable version of the new design/implementation. Seems to work well. Since both UpdateRequest and UpdateRequestExt are now supported for direct updates I think this implementation is complete.

          Still needs automated tests and testing under load.

          Show
          Joel Bernstein added a comment - Runnable version of the new design/implementation. Seems to work well. Since both UpdateRequest and UpdateRequestExt are now supported for direct updates I think this implementation is complete. Still needs automated tests and testing under load.
          Hide
          Stephen Riesenberg added a comment - - edited

          Sorry, was away yesterday. Your patch is getting beyond my original changes, so I won't post it again. In regards to commitWithin, you need the original request when you create a new sub-request. Then basically newRequest.setCommitWithin(originalRequest.getCommitWithin());

          You could only do this in the new getRoutes() method if you have the original request available. Hope that helps!

          Edit: Also, this line:

          Slice slice = router.getTargetSlice(doc.getFieldValue(id).toString(),doc,null,col);

          Would be:

          Slice slice = router.getTargetSlice(doc.getFieldValue(id).toString(),doc,params,col);

          I got an NPE when passing null to the DocRouter.

          Show
          Stephen Riesenberg added a comment - - edited Sorry, was away yesterday. Your patch is getting beyond my original changes, so I won't post it again. In regards to commitWithin, you need the original request when you create a new sub-request. Then basically newRequest.setCommitWithin(originalRequest.getCommitWithin()); You could only do this in the new getRoutes() method if you have the original request available. Hope that helps! Edit: Also, this line: Slice slice = router.getTargetSlice(doc.getFieldValue(id).toString(),doc, null ,col); Would be: Slice slice = router.getTargetSlice(doc.getFieldValue(id).toString(),doc,params,col); I got an NPE when passing null to the DocRouter.
          Hide
          Joel Bernstein added a comment -

          Thanks Stephen, I'll make sure the commitWithin is accounted for at the request level.

          Show
          Joel Bernstein added a comment - Thanks Stephen, I'll make sure the commitWithin is accounted for at the request level.
          Hide
          Joel Bernstein added a comment - - edited

          Stephen's comments brought up other questions which need to be decided:

          1) How to handle delete by requests.
          2) How to handle various action requests (commit, optimize etc...).

          Probably a good approach to this is to process all the direct updates first and then remove the document list from the original request and let the original request execute through the main flow.

          I'll work on getting this scenario into the patch.

          Show
          Joel Bernstein added a comment - - edited Stephen's comments brought up other questions which need to be decided: 1) How to handle delete by requests. 2) How to handle various action requests (commit, optimize etc...). Probably a good approach to this is to process all the direct updates first and then remove the document list from the original request and let the original request execute through the main flow. I'll work on getting this scenario into the patch.
          Hide
          Joel Bernstein added a comment - - edited

          After spending more time looking at everything that an upateRequest can do I realized that not all parts of a request are routable.

          The latest patch handles this by first sending all the routable updates to the correct shard. Then executing a final update request with non-routable update commands such as OPTIMIZE or deleteByQuery.

          This latest patch has not been tested so is for review purposes only.

          Show
          Joel Bernstein added a comment - - edited After spending more time looking at everything that an upateRequest can do I realized that not all parts of a request are routable. The latest patch handles this by first sending all the routable updates to the correct shard. Then executing a final update request with non-routable update commands such as OPTIMIZE or deleteByQuery. This latest patch has not been tested so is for review purposes only.
          Hide
          Joel Bernstein added a comment -

          Latest patch is a working version of CloudSolrServer that handles a mix of routable and nonroutable requests.

          Also added a setter and getter for the boolean directUpdates. This controls whether directUpdates are on. The default is off.

          I added this setter/getter because the response for a directUpdate contains multiple repsonses, one from each of the shards routed to and one for the non-routable request. I figured having a compound response might break existing clients so it's off by default.

          Show
          Joel Bernstein added a comment - Latest patch is a working version of CloudSolrServer that handles a mix of routable and nonroutable requests. Also added a setter and getter for the boolean directUpdates. This controls whether directUpdates are on. The default is off. I added this setter/getter because the response for a directUpdate contains multiple repsonses, one from each of the shards routed to and one for the non-routable request. I figured having a compound response might break existing clients so it's off by default.
          Hide
          Joel Bernstein added a comment -

          Added a simple test case.

          Show
          Joel Bernstein added a comment - Added a simple test case.
          Hide
          Joel Bernstein added a comment -

          Added a much better test case which tests to see if each document was indexed to the shard it was sent to.

          Show
          Joel Bernstein added a comment - Added a much better test case which tests to see if each document was indexed to the shard it was sent to.
          Hide
          Joel Bernstein added a comment -

          Latest version of the patch sends the list of urls for each replica in the slice to LBHttpSolrServer, with the leader being the first url.

          So, if the leader throws an error on the request LBHttpSolrServer will try another replica in the slice.

          Show
          Joel Bernstein added a comment - Latest version of the patch sends the list of urls for each replica in the slice to LBHttpSolrServer, with the leader being the first url. So, if the leader throws an error on the request LBHttpSolrServer will try another replica in the slice.
          Hide
          Joel Bernstein added a comment - - edited

          Requests are now threaded. Documents are batched up and routed to the correct shard. Each routed request is sent in it's own thread so indexing on each Solr server can be done in parallel.

          With this scenario load throughput should increase almost linearly with cluster size.

          Show
          Joel Bernstein added a comment - - edited Requests are now threaded. Documents are batched up and routed to the correct shard. Each routed request is sent in it's own thread so indexing on each Solr server can be done in parallel. With this scenario load throughput should increase almost linearly with cluster size.
          Hide
          Shawn Heisey added a comment -

          Joel Bernstein does the threading destroy the ability for the object to send throwables back up the stack, like ConcurrrentUpdateSolrServer and SOLR-3284? If it does, then I am -1 to that change in CloudSolrServer. Some people (including me) rely on exceptions being thrown on update requests that do not include optimize or commit, so that behavior cannot be lost. If you've found a way to have multiple threads without swallowing exceptions, then I'm +1 ... and please fix CUSS as well.

          If you find that you can't have threading without a handleError method that eats the exception (like CUSS), then I'd rather you create ConcurrentCloudSolrServer instead and include a note about error handling in its javadoc. I don't see a similar note in the javadoc for CUSS.

          Show
          Shawn Heisey added a comment - Joel Bernstein does the threading destroy the ability for the object to send throwables back up the stack, like ConcurrrentUpdateSolrServer and SOLR-3284 ? If it does, then I am -1 to that change in CloudSolrServer. Some people (including me) rely on exceptions being thrown on update requests that do not include optimize or commit, so that behavior cannot be lost. If you've found a way to have multiple threads without swallowing exceptions, then I'm +1 ... and please fix CUSS as well. If you find that you can't have threading without a handleError method that eats the exception (like CUSS), then I'd rather you create ConcurrentCloudSolrServer instead and include a note about error handling in its javadoc. I don't see a similar note in the javadoc for CUSS.
          Hide
          Joel Bernstein added a comment - - edited

          Shawn,

          The way Exceptions are handled in this patch is that each thread makes a separate update request and records an exception if one occurs. When they all complete the main thread checks to see if any exceptions occurred. If it finds an exception occurred it throws that exception.

          If no exceptions occur then the main thread executes the remaining actions such as COMMIT.

          Let me know if that sounds like it's enough control for you.

          Joel

          Show
          Joel Bernstein added a comment - - edited Shawn, The way Exceptions are handled in this patch is that each thread makes a separate update request and records an exception if one occurs. When they all complete the main thread checks to see if any exceptions occurred. If it finds an exception occurred it throws that exception. If no exceptions occur then the main thread executes the remaining actions such as COMMIT. Let me know if that sounds like it's enough control for you. Joel
          Hide
          Mark Miller added a comment -

          Sounds like a fairly major back compat break we have to consider.

          Show
          Mark Miller added a comment - Sounds like a fairly major back compat break we have to consider.
          Hide
          Mark Miller added a comment -

          I have not had time to review the latest work, but my goal for this issue is to have a cloud server that worked as is, but by default hashes and routes on the client side. A simple backward compatible improvement that no one has to turn on. I think anything that breaks back compat heavily or requires turning on options to get this behavior should be a new cloud implementation perhaps.

          Show
          Mark Miller added a comment - I have not had time to review the latest work, but my goal for this issue is to have a cloud server that worked as is, but by default hashes and routes on the client side. A simple backward compatible improvement that no one has to turn on. I think anything that breaks back compat heavily or requires turning on options to get this behavior should be a new cloud implementation perhaps.
          Hide
          Joel Bernstein added a comment -

          The main backwards compatibility issue that I see is the compound response. This patch returns a response that contains the responses from each of the shard requests. This is the main reason that I made the directUpdate functionality optional.

          The exception handling seems to be backwards compatible.

          A separate implementation makes sense too. The CloudSolrServer works fine for less demanding indexing needs. The ConcurrentCloudSolrServer could be used for higher throughput.

          Show
          Joel Bernstein added a comment - The main backwards compatibility issue that I see is the compound response. This patch returns a response that contains the responses from each of the shard requests. This is the main reason that I made the directUpdate functionality optional. The exception handling seems to be backwards compatible. A separate implementation makes sense too. The CloudSolrServer works fine for less demanding indexing needs. The ConcurrentCloudSolrServer could be used for higher throughput.
          Hide
          Joel Bernstein added a comment - - edited

          OK, I've created the ConcurrentUpdateCloudSolrServer class and reverted the CloudSolrServer.

          I think trying to get this functionality into CloudSolrServer and not have any back compatibility issues was going to be a blocker.

          I'll update the ticket name and description to reflect this.

          The initial ConcurrentUpdateCloudSolrServer in this patch seems to run fine. More tests are needed though.

          Show
          Joel Bernstein added a comment - - edited OK, I've created the ConcurrentUpdateCloudSolrServer class and reverted the CloudSolrServer. I think trying to get this functionality into CloudSolrServer and not have any back compatibility issues was going to be a blocker. I'll update the ticket name and description to reflect this. The initial ConcurrentUpdateCloudSolrServer in this patch seems to run fine. More tests are needed though.
          Hide
          Erick Erickson added a comment -

          Formatted according to the official rules, removed extra imports, but no other changes.

          Show
          Erick Erickson added a comment - Formatted according to the official rules, removed extra imports, but no other changes.
          Hide
          Mark Miller added a comment -

          The exception handling seems to be backwards compatible.

          By the sound of it, you changed the runtime behavior in an incompat way - with a single thread you have very tight control and knowledge of what docs got accepted in and what docs failed and the exception for every fail. It's not so easy to get that same back compat behavior with multiple threads and by the description, it sounds like a break to me.

          I think a multi threaded version cannot likely be back compat easily and so it begs a new class similiar to the non cloud servers.

          I'll look at the code when I get a chance though.

          Show
          Mark Miller added a comment - The exception handling seems to be backwards compatible. By the sound of it, you changed the runtime behavior in an incompat way - with a single thread you have very tight control and knowledge of what docs got accepted in and what docs failed and the exception for every fail. It's not so easy to get that same back compat behavior with multiple threads and by the description, it sounds like a break to me. I think a multi threaded version cannot likely be back compat easily and so it begs a new class similiar to the non cloud servers. I'll look at the code when I get a chance though.
          Hide
          Mark Miller added a comment -

          The way Exceptions are handled in this patch is that each thread

          With a new impl, we should consider how we want to do this carefully.

          If we just go this route, it's really not much better the state the concurrent solrserver is in. This could be a good time to introduce better handling for concurrent solrservers - error detectiong and responses - you really still want to know exactly what happened with your updates, and it's currently very difficult to determine that. It's an improvement we have to get to, and it's probably going to be a back compat headache - perhaps we start by introducing something here and eventually this would become the standard client you want to use.

          Show
          Mark Miller added a comment - The way Exceptions are handled in this patch is that each thread With a new impl, we should consider how we want to do this carefully. If we just go this route, it's really not much better the state the concurrent solrserver is in. This could be a good time to introduce better handling for concurrent solrservers - error detectiong and responses - you really still want to know exactly what happened with your updates, and it's currently very difficult to determine that. It's an improvement we have to get to, and it's probably going to be a back compat headache - perhaps we start by introducing something here and eventually this would become the standard client you want to use.
          Hide
          Mark Miller added a comment -

          I guess some of that presuposes that since this thing is already multi threaded for adding to multiple servers concurrently, we would eventually also want to make it even more concurrent.

          Show
          Mark Miller added a comment - I guess some of that presuposes that since this thing is already multi threaded for adding to multiple servers concurrently, we would eventually also want to make it even more concurrent.
          Hide
          Joel Bernstein added a comment - - edited

          This implementation returns all the exceptions that occur as part of the response. You can see each exception and see which server it came from.

          It also returns the routes that were used so you can see which docs were routed to which server. Very useful for testing.

          Show
          Joel Bernstein added a comment - - edited This implementation returns all the exceptions that occur as part of the response. You can see each exception and see which server it came from. It also returns the routes that were used so you can see which docs were routed to which server. Very useful for testing.
          Hide
          Mark Miller added a comment -

          Right, but that's not back compat either anyway - so I think we really want consider what the ideal solution is for multi update responses - if we are free to do something new, lets make sure its the right thing going forward. Especially if we end up taking the concurrent* name. There are other issues open about making a concurrent cloud solrserver that works like the concurrent non Solrcloud server. The big sticking point is how responses will work. Someone does have a patch where they have looked at a response plan for this. This is only semi concurrent in comparison, and so it may not be ideal for this impl to take the concurrent name.

          To me, rather than have a hodge podge of impls, it almost makes more sense to fix the cloud solrserver to hash client side in a fully back compat way, and then tackle a concurrent version that is truly concurrent rather than just concurrent over the set of logical shards. Per has opened a couple issues in the past along those lines.

          Just throwing out things for discussion.

          Show
          Mark Miller added a comment - Right, but that's not back compat either anyway - so I think we really want consider what the ideal solution is for multi update responses - if we are free to do something new, lets make sure its the right thing going forward. Especially if we end up taking the concurrent* name. There are other issues open about making a concurrent cloud solrserver that works like the concurrent non Solrcloud server. The big sticking point is how responses will work. Someone does have a patch where they have looked at a response plan for this. This is only semi concurrent in comparison, and so it may not be ideal for this impl to take the concurrent name. To me, rather than have a hodge podge of impls, it almost makes more sense to fix the cloud solrserver to hash client side in a fully back compat way, and then tackle a concurrent version that is truly concurrent rather than just concurrent over the set of logical shards. Per has opened a couple issues in the past along those lines. Just throwing out things for discussion.
          Hide
          Mark Miller added a comment - - edited

          Related issues:
          SOLR-3018: enhance solr to support per-document results in batch mode
          SOLR-3382: Finegrained error propagation (focus on multi-document updates)
          SOLR-445: Update Handlers abort with bad documents
          SOLR-3384: Custom SolrServer chains - mixing SolrServer-subclass properties as you like to
          SOLR-3383: Async responses in SolrJ

          I think we really want to get CloudSolrServer hashing client side in the short term, but I don't think we want to rush the concurrent impl or a new response format - I think we should do it right and tackle things holistically.

          Show
          Mark Miller added a comment - - edited Related issues: SOLR-3018 : enhance solr to support per-document results in batch mode SOLR-3382 : Finegrained error propagation (focus on multi-document updates) SOLR-445 : Update Handlers abort with bad documents SOLR-3384 : Custom SolrServer chains - mixing SolrServer-subclass properties as you like to SOLR-3383 : Async responses in SolrJ I think we really want to get CloudSolrServer hashing client side in the short term, but I don't think we want to rush the concurrent impl or a new response format - I think we should do it right and tackle things holistically.
          Hide
          Joel Bernstein added a comment -

          Mark,

          Why not just add a a high performance update method to CloudSolrServer. We can document it clearly and people will find it and use it. This way we don't have to worry so much about getting it perfect.

          Eventually when a new concurrent implementation is ready it can be released.

          Show
          Joel Bernstein added a comment - Mark, Why not just add a a high performance update method to CloudSolrServer. We can document it clearly and people will find it and use it. This way we don't have to worry so much about getting it perfect. Eventually when a new concurrent implementation is ready it can be released.
          Hide
          Mark Miller added a comment -

          Why not just add a a high performance update method to CloudSolrServer.

          Heh - and now we are back to that. I'm less hesitant about it now that we have come full circle.

          I think if the std path does the goodness by default - with non of the multi threading or response changing, I would be fine with another 'high performace' method that did extra stuff and has a new response.

          I really don't like the idea of giving up on a good default for CloudSolrServer in 4x - it's fairly easy to get done. Another method is an easy to miss hack I think - this should work nicely within the SolrServer interface.

          How hard would it to be to get a single thread and the same response on the default path - and then add options to enable more threads (and a pretty small runtime change) and the fine grained response? Then we could mark those as experimental options. The default path will hash client side, and options will enable the higher performance / better response?

          Show
          Mark Miller added a comment - Why not just add a a high performance update method to CloudSolrServer. Heh - and now we are back to that. I'm less hesitant about it now that we have come full circle. I think if the std path does the goodness by default - with non of the multi threading or response changing, I would be fine with another 'high performace' method that did extra stuff and has a new response. I really don't like the idea of giving up on a good default for CloudSolrServer in 4x - it's fairly easy to get done. Another method is an easy to miss hack I think - this should work nicely within the SolrServer interface. How hard would it to be to get a single thread and the same response on the default path - and then add options to enable more threads (and a pretty small runtime change) and the fine grained response? Then we could mark those as experimental options. The default path will hash client side, and options will enable the higher performance / better response?
          Hide
          Joel Bernstein added a comment -

          New patch with ConcurrentUpdateCloudSolrServer extending CloudSolrServer.

          A new public update() method has been added that has the following features:

          1) Document Routing
          2) Sends requests to shards in parallel
          3) Uses javabin as it's transport.

          Show
          Joel Bernstein added a comment - New patch with ConcurrentUpdateCloudSolrServer extending CloudSolrServer. A new public update() method has been added that has the following features: 1) Document Routing 2) Sends requests to shards in parallel 3) Uses javabin as it's transport.
          Hide
          Joel Bernstein added a comment -

          The problem with CloudSolrServer back compat is having the same response.

          We'll be getting back a response from each shard. So do we combine those responses into a single response?

          If we do a single thread we can just throw an exception and be done with it. So that would be backwards compatible.

          But if we go this route I'd like to have an update method with the full performance enhancements as well. My main goal here is to be able to have true linear scalability on Solr Cloud indexing. The javabin enhancement in the latest patch is also important for maximizing Solr Cloud indexing performance.

          Show
          Joel Bernstein added a comment - The problem with CloudSolrServer back compat is having the same response. We'll be getting back a response from each shard. So do we combine those responses into a single response? If we do a single thread we can just throw an exception and be done with it. So that would be backwards compatible. But if we go this route I'd like to have an update method with the full performance enhancements as well. My main goal here is to be able to have true linear scalability on Solr Cloud indexing. The javabin enhancement in the latest patch is also important for maximizing Solr Cloud indexing performance.
          Hide
          Mark Miller added a comment -

          The problem with CloudSolrServer back compat is having the same response.

          Not if that response is enabled with a setter? Otherwise you can do the same thing as now.

          If we do a single thread we can just throw an exception and be done with it. So that would be backwards compatible.

          Right, that's my point - you could have a single thread by default - it would be back compat - then if you flip a switch, you get a thread per server an N sized thread pool or whatever you are currently doing for this.

          But if we go this route I'd like to have an update method with the full performance enhancements as well.

          You shouldn't need this method exposed if options enable the multi threading.

          maximizing Solr Cloud indexing performance.

          I think until you are using more than a single thread per server, you will be very very far from easily maxing performance - which is why the other issues interest me a lot more when considering a concurrency improvement.

          Show
          Mark Miller added a comment - The problem with CloudSolrServer back compat is having the same response. Not if that response is enabled with a setter? Otherwise you can do the same thing as now. If we do a single thread we can just throw an exception and be done with it. So that would be backwards compatible. Right, that's my point - you could have a single thread by default - it would be back compat - then if you flip a switch, you get a thread per server an N sized thread pool or whatever you are currently doing for this. But if we go this route I'd like to have an update method with the full performance enhancements as well. You shouldn't need this method exposed if options enable the multi threading. maximizing Solr Cloud indexing performance. I think until you are using more than a single thread per server, you will be very very far from easily maxing performance - which is why the other issues interest me a lot more when considering a concurrency improvement.
          Hide
          Joel Bernstein added a comment -

          Thinking about the multiple response issue.

          Creating one response from each of the shard responses would be basically the same thing that happens when the routing is done on the Solr servers. I see no reason to not do this.

          Here is what I propose:

          A CloudSolrServer implementation that by default does document routing with a single merged response. If you flip a switch you get document routing with:

          1) Parallel execution, one thread per shard.
          2) A non-backwards compatible response with responses from each shard.
          3) javabin transport.

          Show
          Joel Bernstein added a comment - Thinking about the multiple response issue. Creating one response from each of the shard responses would be basically the same thing that happens when the routing is done on the Solr servers. I see no reason to not do this. Here is what I propose: A CloudSolrServer implementation that by default does document routing with a single merged response. If you flip a switch you get document routing with: 1) Parallel execution, one thread per shard. 2) A non-backwards compatible response with responses from each shard. 3) javabin transport.
          Hide
          Joel Bernstein added a comment - - edited

          Latest patch is a version of CloudSolrServer that:

          1)Does document routing
          2)Sends requests to each shard in a separate thread
          3) Uses javabin transport
          4) Is backwards compatible with both the response and exception.

          It does this all by default, no switches needed.

          This is accomplished by returning a response or throwing an exception that condenses the info from each shard into a single response or exception.

          To get the full info for the response or exception you can down cast to either RouteReponse or RouteException which gives you a detailed breakdown from each of the shards.

          Will update the ticket name and description accordingly.

          Show
          Joel Bernstein added a comment - - edited Latest patch is a version of CloudSolrServer that: 1)Does document routing 2)Sends requests to each shard in a separate thread 3) Uses javabin transport 4) Is backwards compatible with both the response and exception. It does this all by default, no switches needed. This is accomplished by returning a response or throwing an exception that condenses the info from each shard into a single response or exception. To get the full info for the response or exception you can down cast to either RouteReponse or RouteException which gives you a detailed breakdown from each of the shards. Will update the ticket name and description accordingly.
          Hide
          Mark Miller added a comment -

          I'll do a review shortly.

          It does this all by default, no switches needed.

          exception that condenses the info from each shard into a single response

          How can that be backward compat if people are parsing the response? I'm not convinced you can do all this by default and be back compat, but I'll look at the latest patch.

          And batch will again have the slight change in runtime behavior.

          I still think these extras will need to be off by default until 5.

          Show
          Mark Miller added a comment - I'll do a review shortly. It does this all by default, no switches needed. exception that condenses the info from each shard into a single response How can that be backward compat if people are parsing the response? I'm not convinced you can do all this by default and be back compat, but I'll look at the latest patch. And batch will again have the slight change in runtime behavior. I still think these extras will need to be off by default until 5.
          Hide
          Mark Miller added a comment -

          I don't see Routable in the current patch.

          Show
          Mark Miller added a comment - I don't see Routable in the current patch.
          Hide
          Joel Bernstein added a comment -

          OK, adding it now.

          Show
          Joel Bernstein added a comment - OK, adding it now.
          Hide
          Joel Bernstein added a comment -

          Added Routable.java

          Show
          Joel Bernstein added a comment - Added Routable.java
          Hide
          Shawn Heisey added a comment - - edited

          I still think these extras will need to be off by default until 5.

          +1. Even in version 5, it should still be possible to turn them off. Advanced features (threading in particular) have a tendency to cause subtle bugs, and it's difficult to know if they are bugs in the underlying code or bugs in the advanced feature. Being able to turn them off will greatly help with debugging.

          IMHO, most tests that use CloudSolrServer should randomly turn things like threading on or off, change the writer and parser, etc. Which reminds me, I need to file an issue and work on a patch for Cloud/LBHttpSolrServer implementations that includes many of the getters/setters from HttpSolrServer.

          Show
          Shawn Heisey added a comment - - edited I still think these extras will need to be off by default until 5. +1. Even in version 5, it should still be possible to turn them off. Advanced features (threading in particular) have a tendency to cause subtle bugs, and it's difficult to know if they are bugs in the underlying code or bugs in the advanced feature. Being able to turn them off will greatly help with debugging. IMHO, most tests that use CloudSolrServer should randomly turn things like threading on or off, change the writer and parser, etc. Which reminds me, I need to file an issue and work on a patch for Cloud/LBHttpSolrServer implementations that includes many of the getters/setters from HttpSolrServer.
          Hide
          Joel Bernstein added a comment -

          The initial response behaves very much like a response when document routing is done on the server. On the server the Solr instance sends off the docs to the shards to be indexed and then returns a single unified response.

          This does basically the same thing but let's you down cast to get more info if you want to.

          Show
          Joel Bernstein added a comment - The initial response behaves very much like a response when document routing is done on the server. On the server the Solr instance sends off the docs to the shards to be indexed and then returns a single unified response. This does basically the same thing but let's you down cast to get more info if you want to.
          Hide
          Shawn Heisey added a comment -

          Joel Bernstein I was looking into how you switched to the binary writer so I could develop a patch for SOLR-4715. You've got it creating new writer and parser objects for every HttpSolrServer. Shouldn't there be one instance of each? The existing LBHttpSolrServer class shares one parser object for all of the inner HttpSolrServer objects.

          I'm struggling a bit on my patch, but if I can find a way to do it, there is some overlap with this issue.

          Show
          Shawn Heisey added a comment - Joel Bernstein I was looking into how you switched to the binary writer so I could develop a patch for SOLR-4715 . You've got it creating new writer and parser objects for every HttpSolrServer. Shouldn't there be one instance of each? The existing LBHttpSolrServer class shares one parser object for all of the inner HttpSolrServer objects. I'm struggling a bit on my patch, but if I can find a way to do it, there is some overlap with this issue.
          Hide
          Joel Bernstein added a comment -

          Shawn, I haven't had a chance to look at this yet but I suspect you're right. I'll make this change when I next update the patch.

          Show
          Joel Bernstein added a comment - Shawn, I haven't had a chance to look at this yet but I suspect you're right. I'll make this change when I next update the patch.
          Hide
          Joel Bernstein added a comment - - edited

          Mark, there were a couple of changes I still wanted to make to this ticket:

          1) Add a switch to turn on/off threading.
          2) Add a thread pool rather then spawning new threads each request.
          3) Add a few more tests.

          But before I dive in I wanted to be sure we're on the same page.

          Does this design satisfy the back compat issue for the response and exceptions? Are there other show stoppers in this design/implementation that need to be addressed?

          Show
          Joel Bernstein added a comment - - edited Mark, there were a couple of changes I still wanted to make to this ticket: 1) Add a switch to turn on/off threading. 2) Add a thread pool rather then spawning new threads each request. 3) Add a few more tests. But before I dive in I wanted to be sure we're on the same page. Does this design satisfy the back compat issue for the response and exceptions? Are there other show stoppers in this design/implementation that need to be addressed?
          Hide
          Mark Miller added a comment -

          1 - yes, I think we need the option of using just one thread. I also think that should be the default until 5x.
          2 - yes, I think a thread pool is the way to go
          3 - always will get a +1 from me on more tests

          Does this design satisfy the back compat issue for the response and exceptions?

          I have to review closer - it's on my short list. I think if we can use one thread and the responses have the same format by default, I won't have much concern.

          Are there other show stoppers in this design/implementation

          I won't know till I look closer, but I don't doubt we will get this in.

          Show
          Mark Miller added a comment - 1 - yes, I think we need the option of using just one thread. I also think that should be the default until 5x. 2 - yes, I think a thread pool is the way to go 3 - always will get a +1 from me on more tests Does this design satisfy the back compat issue for the response and exceptions? I have to review closer - it's on my short list. I think if we can use one thread and the responses have the same format by default, I won't have much concern. Are there other show stoppers in this design/implementation I won't know till I look closer, but I don't doubt we will get this in.
          Hide
          Shawn Heisey added a comment - - edited

          The work I've been doing (slowly) on SOLR-4715 overlaps with this issue. I've been thinking that I need to divide it into bite-size tasks for my own purposes, but that could possibly help here too. If I concentrate first on giving LBHttpSolrServer an easy way to set the writer to binary, that would reduce the complexity of this patch quite a bit. Thoughts?

          Show
          Shawn Heisey added a comment - - edited The work I've been doing (slowly) on SOLR-4715 overlaps with this issue. I've been thinking that I need to divide it into bite-size tasks for my own purposes, but that could possibly help here too. If I concentrate first on giving LBHttpSolrServer an easy way to set the writer to binary, that would reduce the complexity of this patch quite a bit. Thoughts?
          Hide
          Shawn Heisey added a comment -

          Adding setters to LBHttpSolrServer ready to go in SOLR-4919.

          Show
          Shawn Heisey added a comment - Adding setters to LBHttpSolrServer ready to go in SOLR-4919 .
          Hide
          Joel Bernstein added a comment -

          Shawn, the setter approach is nicer then having to extend LBHttpSolrServer. What are your thoughts on how the two patches could be worked together?

          Show
          Joel Bernstein added a comment - Shawn, the setter approach is nicer then having to extend LBHttpSolrServer. What are your thoughts on how the two patches could be worked together?
          Hide
          Shawn Heisey added a comment -

          I asked Mark Miller to review SOLR-4919 for me. I think his objection to setters that only work in certain situations is reasonable, but the patch that does it the right way seems to be causing intermittent test failures. The setter approach seems to not cause any failures.

          I don't want to hold up your work here, but it would be very nice if LBHttpSolrServer was more friendly to your changes.

          Show
          Shawn Heisey added a comment - I asked Mark Miller to review SOLR-4919 for me. I think his objection to setters that only work in certain situations is reasonable, but the patch that does it the right way seems to be causing intermittent test failures. The setter approach seems to not cause any failures. I don't want to hold up your work here, but it would be very nice if LBHttpSolrServer was more friendly to your changes.
          Hide
          Joel Bernstein added a comment -

          New patch, added setter to turn off and on threaded updates, default off.

          Added a thread pool for threaded updates.

          Removed the javabin transport. We can add this when Shawn wraps up his work.

          Show
          Joel Bernstein added a comment - New patch, added setter to turn off and on threaded updates, default off. Added a thread pool for threaded updates. Removed the javabin transport. We can add this when Shawn wraps up his work.
          Hide
          Mark Miller added a comment -

          I'm going to try and review this over the weekend.

          Show
          Mark Miller added a comment - I'm going to try and review this over the weekend.
          Hide
          Mark Miller added a comment -

          Are all tests passing for you with the latest patch Joel? My first attempt has a couple tests failing with: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing content stream

          Show
          Mark Miller added a comment - Are all tests passing for you with the latest patch Joel? My first attempt has a couple tests failing with: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing content stream
          Hide
          Mark Miller added a comment -

          A couple quick comments:

          • It seems like we don't route delete by id? Why not?
          • It doesn't look like the new update path deals with Aliases? We should be sure to hook this up to run sometimes in the alias integration test.
          • It only works with CompositeIdRouter? What is the limitation there?
          Show
          Mark Miller added a comment - A couple quick comments: It seems like we don't route delete by id? Why not? It doesn't look like the new update path deals with Aliases? We should be sure to hook this up to run sometimes in the alias integration test. It only works with CompositeIdRouter? What is the limitation there?
          Hide
          Joel Bernstein added a comment -

          Mark, I'll check on the test error.

          I can add the code to get delete by id routing.

          It looks like aliases should be pretty straight forward to add in as well.

          I don't think CloudSolrServer can do document routing with the implicit router. With the implicit router we don't have any information about where to send the document. I think with the implicit router the strategy would be to use the HttpSolrSever and set the baseUrl manually for each document.

          Show
          Joel Bernstein added a comment - Mark, I'll check on the test error. I can add the code to get delete by id routing. It looks like aliases should be pretty straight forward to add in as well. I don't think CloudSolrServer can do document routing with the implicit router. With the implicit router we don't have any information about where to send the document. I think with the implicit router the strategy would be to use the HttpSolrSever and set the baseUrl manually for each document.
          Hide
          Mark Miller added a comment -

          I don't think CloudSolrServer can do document routing with the implicit router. With the implicit router we don't have any information about where to send the document.

          Right, but thats a pluggable point - we may add more impls, users can write impls - we should be able to route with any imple the same way the back end code does, no?

          Show
          Mark Miller added a comment - I don't think CloudSolrServer can do document routing with the implicit router. With the implicit router we don't have any information about where to send the document. Right, but thats a pluggable point - we may add more impls, users can write impls - we should be able to route with any imple the same way the back end code does, no?
          Hide
          Joel Bernstein added a comment -

          Ok, I'll change this so it routes all impls but the implicit router.

          Show
          Joel Bernstein added a comment - Ok, I'll change this so it routes all impls but the implicit router.
          Hide
          Joel Bernstein added a comment - - edited
          • Added support for aliases
          • Added routing for delete by id's
          • Changed logic so all router impls but implicit are acceptable for routing.

          The test error occurs because the CloudSolrServer clears the documents after it performs the updates. It does this because it then executes the non-routable requests such as commit. So this error will only occur if you try issue the same update request again following a request with CloudSolrServer.

          I'll fix this tomorrow by using a different approach to executing the non-routables.

          Show
          Joel Bernstein added a comment - - edited Added support for aliases Added routing for delete by id's Changed logic so all router impls but implicit are acceptable for routing. The test error occurs because the CloudSolrServer clears the documents after it performs the updates. It does this because it then executes the non-routable requests such as commit. So this error will only occur if you try issue the same update request again following a request with CloudSolrServer. I'll fix this tomorrow by using a different approach to executing the non-routables.
          Hide
          Joel Bernstein added a comment -

          New patch that passes all existing tests.

          Removed all side effects on the request and also changed how parameters were being passed to routed requests.

          Show
          Joel Bernstein added a comment - New patch that passes all existing tests. Removed all side effects on the request and also changed how parameters were being passed to routed requests.
          Hide
          Joel Bernstein added a comment -

          Added tests for UpdateRequestExt document routing and deleteById routing.

          Added test for UpdateRequest deleteById routing

          Show
          Joel Bernstein added a comment - Added tests for UpdateRequestExt document routing and deleteById routing. Added test for UpdateRequest deleteById routing
          Hide
          Markus Jelsma added a comment -

          Joel, it is working perfectly and already runs fine in one production environment. It's about 30% more efficient when sending data from 20 Hadoop reducers to 10 Solr SSD nodes using routing than the current method. We didn't implement the routable deletes - we're still using SolrServer.deleteById(), seems UpdateRequestExt is not going to be the definitive API to talk to, right?

          I assume it won't make it in 4.4 but we should make an effort to get committed to trunk and/or 4.5 some day soon.

          Show
          Markus Jelsma added a comment - Joel, it is working perfectly and already runs fine in one production environment. It's about 30% more efficient when sending data from 20 Hadoop reducers to 10 Solr SSD nodes using routing than the current method. We didn't implement the routable deletes - we're still using SolrServer.deleteById(), seems UpdateRequestExt is not going to be the definitive API to talk to, right? I assume it won't make it in 4.4 but we should make an effort to get committed to trunk and/or 4.5 some day soon.
          Hide
          Joel Bernstein added a comment -

          Markus, thanks for the info. Glad to hear it's working for you in production.

          Just wondering if you've turned on parallel updates and what batch size you're using?

          I'm thinking that large batch sizes with parallel updates would be very beneficial for performance. That way you would get long stretches of parallel indexing across the cluster.

          I suspect that UpdateRequestExt will eventually get folded into UpdateRequest based on the comments in the source.

          I'll ping Mark and see what he thinks about getting this committed.

          Show
          Joel Bernstein added a comment - Markus, thanks for the info. Glad to hear it's working for you in production. Just wondering if you've turned on parallel updates and what batch size you're using? I'm thinking that large batch sizes with parallel updates would be very beneficial for performance. That way you would get long stretches of parallel indexing across the cluster. I suspect that UpdateRequestExt will eventually get folded into UpdateRequest based on the comments in the source. I'll ping Mark and see what he thinks about getting this committed.
          Hide
          Markus Jelsma added a comment -

          Batch size is about 394 iirc, not very large indeed. I don't think i enabled parallel updates.

          Show
          Markus Jelsma added a comment - Batch size is about 394 iirc, not very large indeed. I don't think i enabled parallel updates.
          Hide
          Mark Miller added a comment -

          It's def too late for 4.4 (we already branched and the first rc vote is ongoing), but high priority for 4.5.

          Show
          Mark Miller added a comment - It's def too late for 4.4 (we already branched and the first rc vote is ongoing), but high priority for 4.5.
          Hide
          Joel Bernstein added a comment -

          Thanks Mark.

          Show
          Joel Bernstein added a comment - Thanks Mark.
          Hide
          Joel Bernstein added a comment -

          Markus,

          cloudClient.setParallelUpdates(true);

          Will turn on parallel updates, this in theory should give you much better performance. Depending on the size of docs you could probably go with a pretty high batch size. With ten servers a batch size of 5000 would send roughly 500 docs to each server.

          Show
          Joel Bernstein added a comment - Markus, cloudClient.setParallelUpdates(true); Will turn on parallel updates, this in theory should give you much better performance. Depending on the size of docs you could probably go with a pretty high batch size. With ten servers a batch size of 5000 would send roughly 500 docs to each server.
          Hide
          Greg Walters added a comment -

          After being able to consistently deadlock my SolrCloud cluster I've applied this patch in production and don't have any further issues. +1 on it working.

          Show
          Greg Walters added a comment - After being able to consistently deadlock my SolrCloud cluster I've applied this patch in production and don't have any further issues. +1 on it working.
          Hide
          Markus Jelsma added a comment -

          +1 from here as well. We're relying on this and parallel updates to keep load down and it works very well.

          Show
          Markus Jelsma added a comment - +1 from here as well. We're relying on this and parallel updates to keep load down and it works very well.
          Hide
          Jack Krupansky added a comment -

          I was surprised to see that this "high priority" is still not committed for 4.5. Although, the actual Jira priority is still listed as "Minor".

          Show
          Jack Krupansky added a comment - I was surprised to see that this "high priority" is still not committed for 4.5. Although, the actual Jira priority is still listed as "Minor".
          Hide
          Mark Miller added a comment -

          this "high priority" ... Jira ... is still listed as "Minor".

          My personal priority list has nothing to do with the severity in JIRA for this issue. I'm assigned and working on this - surprising or not.

          I have stated that this is an important issue that is on the road map and that it is high priority for me to get into 4.5. Nothing has changed.

          Show
          Mark Miller added a comment - this "high priority" ... Jira ... is still listed as "Minor". My personal priority list has nothing to do with the severity in JIRA for this issue. I'm assigned and working on this - surprising or not. I have stated that this is an important issue that is on the road map and that it is high priority for me to get into 4.5. Nothing has changed.
          Hide
          Mark Miller added a comment -

          Here is my first pass on top of Joel's work. Comments to come.

          Show
          Mark Miller added a comment - Here is my first pass on top of Joel's work. Comments to come.
          Hide
          Mark Miller added a comment -

          Also, FYI, there are a few remaining issues to smooth out, so a handful of non solrcloud tests in the solrj package are failing. I'll have a second pass up that resolves these remaining issues before long.

          Show
          Mark Miller added a comment - Also, FYI, there are a few remaining issues to smooth out, so a handful of non solrcloud tests in the solrj package are failing. I'll have a second pass up that resolves these remaining issues before long.
          Hide
          Joel Bernstein added a comment - - edited

          Awesome! Looks like javabin transport is part of this as well. My earlier tests showed this provided a large performance increase.

          Also looks like you cleaned up the UpdateRequestExt, which is good.

          Hope to have a chance today to apply the patch and test things out.

          Show
          Joel Bernstein added a comment - - edited Awesome! Looks like javabin transport is part of this as well. My earlier tests showed this provided a large performance increase. Also looks like you cleaned up the UpdateRequestExt, which is good. Hope to have a chance today to apply the patch and test things out.
          Hide
          Mark Miller added a comment -

          Here is a cleaned up patch. All tests are passing for me.

          I've made some mostly minor changes as well as:

          • Removed UpdateRequestExt and the Router workaround for it.
          • Randomly enable/disable parallel updates in tests.
          • Adds SolrCloud javabin support since it was inline with merging UpdateRequestExt into UpdateRequest.
          • Enables parallel updates by default - the more I have thought about this, the more I've started feeling we should change this default. The minor back compat issue around it is not worth the slow default.
          Show
          Mark Miller added a comment - Here is a cleaned up patch. All tests are passing for me. I've made some mostly minor changes as well as: Removed UpdateRequestExt and the Router workaround for it. Randomly enable/disable parallel updates in tests. Adds SolrCloud javabin support since it was inline with merging UpdateRequestExt into UpdateRequest. Enables parallel updates by default - the more I have thought about this, the more I've started feeling we should change this default. The minor back compat issue around it is not worth the slow default.
          Hide
          Mark Miller added a comment -

          The patch also has the work fro SOLR-3249: "Allow CloudSolrServer and SolrCmdDistributor to use JavaBin", but it does not yet make it the default for CloudSolrServer or switch to it in the SolrCmdDistributor - I have made SOLR-5223 to track that change after this goes in.

          Show
          Mark Miller added a comment - The patch also has the work fro SOLR-3249 : "Allow CloudSolrServer and SolrCmdDistributor to use JavaBin", but it does not yet make it the default for CloudSolrServer or switch to it in the SolrCmdDistributor - I have made SOLR-5223 to track that change after this goes in.
          Hide
          Mark Miller added a comment -

          I'd like to commit this soon so I can more easily noodle around on top of it. It should also make it easier to collaborate on any final tweaks or additions without large dueling patches. I'll leave the issue open for a bit so anyone else has a chance to weigh in.

          Show
          Mark Miller added a comment - I'd like to commit this soon so I can more easily noodle around on top of it. It should also make it easier to collaborate on any final tweaks or additions without large dueling patches. I'll leave the issue open for a bit so anyone else has a chance to weigh in.
          Hide
          Shawn Heisey added a comment -

          Mark Miller The only reason I thought my issue (SOLR-4715) should come first is that I was planning to incorporate some convenience for bit-changing (solrserver, maybe some httpclient) support (including javabin) in a cleaner way (IMHO) than what Joel had done.

          It's now been long enough since I put any work on my patch that I think I'd have to redo it anyway, so feel free to get this in because it's much more important. I'll revisit my patch afterwards.

          I do have one question, the same one I asked early on: Does the parallel indexing support still allow exceptions to (eventually) bubble up the stack to the application, avoiding a repeat of SOLR-3284?

          Show
          Shawn Heisey added a comment - Mark Miller The only reason I thought my issue ( SOLR-4715 ) should come first is that I was planning to incorporate some convenience for bit-changing (solrserver, maybe some httpclient) support (including javabin) in a cleaner way (IMHO) than what Joel had done. It's now been long enough since I put any work on my patch that I think I'd have to redo it anyway, so feel free to get this in because it's much more important. I'll revisit my patch afterwards. I do have one question, the same one I asked early on: Does the parallel indexing support still allow exceptions to (eventually) bubble up the stack to the application, avoiding a repeat of SOLR-3284 ?
          Hide
          Joel Bernstein added a comment -

          Yes, exceptions are thrown with each batch request.

          For example, if a batch of 10000 docs is sent to 10 servers. The batch is split up into 10 batches, one for each server. Each batch is then sent in it's own thread, while the main thread waits. When all the servers finish indexing the main thread gathers up any exceptions that happened in the request threads, and throws a RouteException, which contains a map of the servers and the exception that occurred.

          Show
          Joel Bernstein added a comment - Yes, exceptions are thrown with each batch request. For example, if a batch of 10000 docs is sent to 10 servers. The batch is split up into 10 batches, one for each server. Each batch is then sent in it's own thread, while the main thread waits. When all the servers finish indexing the main thread gathers up any exceptions that happened in the request threads, and throws a RouteException, which contains a map of the servers and the exception that occurred.
          Hide
          ASF subversion and git services added a comment -

          Commit 1521713 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1521713 ]

          SOLR-4816: CloudSolrServer can now route updates locally and no longer relies on inter-node update forwarding.
          SOLR-3249: Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.
          SOLR-4816: CloudSolrServer now uses multiple threads to send updates by default.

          Show
          ASF subversion and git services added a comment - Commit 1521713 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1521713 ] SOLR-4816 : CloudSolrServer can now route updates locally and no longer relies on inter-node update forwarding. SOLR-3249 : Allow CloudSolrServer and SolrCmdDistributor to use JavaBin. SOLR-4816 : CloudSolrServer now uses multiple threads to send updates by default.
          Hide
          ASF subversion and git services added a comment -

          Commit 1521726 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1521726 ]

          SOLR-4816: CloudSolrServer can now route updates locally and no longer relies on inter-node update forwarding.
          SOLR-3249: Allow CloudSolrServer and SolrCmdDistributor to use JavaBin.
          SOLR-4816: CloudSolrServer now uses multiple threads to send updates by default.

          Show
          ASF subversion and git services added a comment - Commit 1521726 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1521726 ] SOLR-4816 : CloudSolrServer can now route updates locally and no longer relies on inter-node update forwarding. SOLR-3249 : Allow CloudSolrServer and SolrCmdDistributor to use JavaBin. SOLR-4816 : CloudSolrServer now uses multiple threads to send updates by default.
          Hide
          Mark Miller added a comment -

          Some recent jenkins failures in org.apache.solr.cloud.SyncSliceTest to look at: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/

          Show
          Mark Miller added a comment - Some recent jenkins failures in org.apache.solr.cloud.SyncSliceTest to look at: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/
          Hide
          Shawn Heisey added a comment -

          As requested via IRC, here is a reminder comment to add sugar methods like setRequestWriter to CloudSolrServer.

          Show
          Shawn Heisey added a comment - As requested via IRC, here is a reminder comment to add sugar methods like setRequestWriter to CloudSolrServer.
          Hide
          Joel Bernstein added a comment -

          Just did a fresh 4x pull and org.apache.solr.cloud.SyncSliceTest ran clean for me several times.

          Show
          Joel Bernstein added a comment - Just did a fresh 4x pull and org.apache.solr.cloud.SyncSliceTest ran clean for me several times.
          Hide
          Shikhar Bhushan added a comment -

          We've run into some issues with CloudSolrServer leaking loads of LBHttpSolrServer's aliveCheckExecutor thread pools with parallelUpdates = true.

          The root cause here is that the RequestTask inner class is creating a new LBHttpSolrServer for each run() rather than utilizing CloudSolrServer.lbServer which is already available to it.

          Some detail: LBHttpSolrServer lazily initializes a single-threaded ScheduledExecutorService for the "aliveCheckExecutor" when e.g. there is some kind of error talking to a server. So this issue tends to come up when Solr nodes are unavailable and exceptions are thrown. There is also no call to shutdown() on that LBHttpSolrServer which gets created from RequestTask.run(). LBHttpSolrServer does have a finalizer that tries to shutdown the aliveCheckExecutor but there's no guarantee of finalizers executing (or maybe there is some other memory leak preventing that LBHttpSolrServer from being GC'ed at all).

          So the one-liner fix that should definitely go in is to simply have RequestTask use CloudSolrServer.lbServer.

          I have attached a patch that removes RequestTask altogether in favor of simply using Callable's and Future's which is much more idiomatic. (RequestTask-removal.patch)

          Show
          Shikhar Bhushan added a comment - We've run into some issues with CloudSolrServer leaking loads of LBHttpSolrServer's aliveCheckExecutor thread pools with parallelUpdates = true . The root cause here is that the RequestTask inner class is creating a new LBHttpSolrServer for each run() rather than utilizing CloudSolrServer.lbServer which is already available to it. Some detail: LBHttpSolrServer lazily initializes a single-threaded ScheduledExecutorService for the "aliveCheckExecutor" when e.g. there is some kind of error talking to a server. So this issue tends to come up when Solr nodes are unavailable and exceptions are thrown. There is also no call to shutdown() on that LBHttpSolrServer which gets created from RequestTask.run(). LBHttpSolrServer does have a finalizer that tries to shutdown the aliveCheckExecutor but there's no guarantee of finalizers executing (or maybe there is some other memory leak preventing that LBHttpSolrServer from being GC'ed at all). So the one-liner fix that should definitely go in is to simply have RequestTask use CloudSolrServer.lbServer. I have attached a patch that removes RequestTask altogether in favor of simply using Callable's and Future's which is much more idiomatic. (RequestTask-removal.patch)
          Hide
          Shikhar Bhushan added a comment -

          This is a separate issue but worth noting: CloudSolrServer.shutdown() does not call lbServer.shutdown()

          In case the lbServer is provided as a constructor arg from outside that probably make sense.

          But in case of the constructors where it is created internally, IMO CloudSolrServer should assume ownership and also shut it down.

          Show
          Shikhar Bhushan added a comment - This is a separate issue but worth noting: CloudSolrServer.shutdown() does not call lbServer.shutdown() In case the lbServer is provided as a constructor arg from outside that probably make sense. But in case of the constructors where it is created internally, IMO CloudSolrServer should assume ownership and also shut it down.
          Hide
          Mark Miller added a comment -

          Just did a fresh 4x pull and org.apache.solr.cloud.SyncSliceTest ran clean for me several times.

          Oh, it passes for me all the time too - it can be hard to match the Apache FreeBSD jenkins runs.

          Another fail by the way (I've seen this once locally too, but very rare):

          https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/378/testReport/junit/org.apache.solr.cloud/BasicDistributedZk2Test/testDistribSearch/

          Show
          Mark Miller added a comment - Just did a fresh 4x pull and org.apache.solr.cloud.SyncSliceTest ran clean for me several times. Oh, it passes for me all the time too - it can be hard to match the Apache FreeBSD jenkins runs. Another fail by the way (I've seen this once locally too, but very rare): https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/378/testReport/junit/org.apache.solr.cloud/BasicDistributedZk2Test/testDistribSearch/
          Hide
          Joel Bernstein added a comment -

          Shikhar,

          I ran some data through with the patch you provided and it looks good. I'll do some more testing but so far +1 Shikhar's patch.

          Mark,

          When I started testing this morning I found that directUpdates are short circuiting now if params are null. I found this confusing because I was setting collection via the default collection setter on the client. The way this was written before might be less confusing. Before it checked the nonroutableParams.

          In my local build I removed:

           if (params == null) {
                return null;
              }
          

          And the reverted back to checking the collection with this code:

              String collection = nonRoutableParams.get("collection", defaultCollection);
          

          This seemed to work well.

          I can patch this after you make the call on Shikhar's patch.

          Show
          Joel Bernstein added a comment - Shikhar, I ran some data through with the patch you provided and it looks good. I'll do some more testing but so far +1 Shikhar's patch. Mark, When I started testing this morning I found that directUpdates are short circuiting now if params are null. I found this confusing because I was setting collection via the default collection setter on the client. The way this was written before might be less confusing. Before it checked the nonroutableParams. In my local build I removed: if (params == null ) { return null ; } And the reverted back to checking the collection with this code: String collection = nonRoutableParams.get( "collection" , defaultCollection); This seemed to work well. I can patch this after you make the call on Shikhar's patch.
          Hide
          Mark Miller added a comment -

          Thanks Shikhar!

          Show
          Mark Miller added a comment - Thanks Shikhar!
          Hide
          Mark Miller added a comment -

          Before it checked the nonroutableParams.

          Thanks! Yeah, I had tried to go down a different path at one point - I didn't want to have to know which params were routable or not - but of course I quickly hit a wall. Must have missed this one when reverting back. We should try and nail this one with a test.

          Show
          Mark Miller added a comment - Before it checked the nonroutableParams. Thanks! Yeah, I had tried to go down a different path at one point - I didn't want to have to know which params were routable or not - but of course I quickly hit a wall. Must have missed this one when reverting back. We should try and nail this one with a test.
          Hide
          ASF subversion and git services added a comment -

          Commit 1522684 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1522684 ]

          SOLR-4816: Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.

          Show
          ASF subversion and git services added a comment - Commit 1522684 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1522684 ] SOLR-4816 : Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.
          Hide
          ASF subversion and git services added a comment -

          Commit 1522685 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1522685 ]

          SOLR-4816: Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.

          Show
          ASF subversion and git services added a comment - Commit 1522685 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1522685 ] SOLR-4816 : Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.
          Hide
          ASF subversion and git services added a comment -

          Commit 1522690 from Mark Miller in branch 'dev/branches/lucene_solr_4_5'
          [ https://svn.apache.org/r1522690 ]

          SOLR-4816: Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.

          Show
          ASF subversion and git services added a comment - Commit 1522690 from Mark Miller in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1522690 ] SOLR-4816 : Don't create "loads" of LBHttpSolrServer's, shutdown LBHttpSolrServer when appropriate, get collection from nonRoutableParams.
          Hide
          Shikhar Bhushan added a comment -

          Thanks Mark! Also for adding call to lbServer.shutdown() when appropriate.

          This is a really minor thing, but I later realized

          final Map<String, Future<NamedList<?>>> responseFutures = new HashMap<String, Future<NamedList<?>>>();

          is better declared with an initialCapacity as that is known

          final Map<String, Future<NamedList<?>>> responseFutures = new HashMap<String, Future<NamedList<?>>>(routes.size());

          Show
          Shikhar Bhushan added a comment - Thanks Mark! Also for adding call to lbServer.shutdown() when appropriate. This is a really minor thing, but I later realized final Map<String, Future<NamedList<?>>> responseFutures = new HashMap<String, Future<NamedList<?>>>(); is better declared with an initialCapacity as that is known final Map<String, Future<NamedList<?>>> responseFutures = new HashMap<String, Future<NamedList<?>>>(routes.size());
          Hide
          ASF subversion and git services added a comment -

          Commit 1524170 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1524170 ]

          SOLR-4816: deal with leader=null case and init map with known size

          Show
          ASF subversion and git services added a comment - Commit 1524170 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1524170 ] SOLR-4816 : deal with leader=null case and init map with known size
          Hide
          ASF subversion and git services added a comment -

          Commit 1524171 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1524171 ]

          SOLR-4816: deal with leader=null case and init map with known size

          Show
          ASF subversion and git services added a comment - Commit 1524171 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1524171 ] SOLR-4816 : deal with leader=null case and init map with known size
          Hide
          ASF subversion and git services added a comment -

          Commit 1524174 from Mark Miller in branch 'dev/branches/lucene_solr_4_5'
          [ https://svn.apache.org/r1524174 ]

          SOLR-4816: deal with leader=null case and init map with known size

          Show
          ASF subversion and git services added a comment - Commit 1524174 from Mark Miller in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1524174 ] SOLR-4816 : deal with leader=null case and init map with known size
          Hide
          Mark Miller added a comment -

          That should address some of the recent jenkins fails this has caused and addresses Shikhar's last comment.

          Show
          Mark Miller added a comment - That should address some of the recent jenkins fails this has caused and addresses Shikhar's last comment.
          Hide
          Mark Miller added a comment -

          Thanks Joel and Thanks Shikhar!

          Show
          Mark Miller added a comment - Thanks Joel and Thanks Shikhar!
          Hide
          ASF subversion and git services added a comment -

          Commit 1524176 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1524176 ]

          SOLR-4816: add missing CHANGES credit

          Show
          ASF subversion and git services added a comment - Commit 1524176 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1524176 ] SOLR-4816 : add missing CHANGES credit
          Hide
          Markus Jelsma added a comment -

          Mark, is it still enabled the same way as Joel's original patches?

          Show
          Markus Jelsma added a comment - Mark, is it still enabled the same way as Joel's original patches?
          Hide
          ASF subversion and git services added a comment -

          Commit 1524177 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1524177 ]

          SOLR-4816: add missing CHANGES credit

          Show
          ASF subversion and git services added a comment - Commit 1524177 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1524177 ] SOLR-4816 : add missing CHANGES credit
          Hide
          Mark Miller added a comment -

          Parallel threads is now on by default. Document routing works if the right field id is set - it defaults to "id".

          Show
          Mark Miller added a comment - Parallel threads is now on by default. Document routing works if the right field id is set - it defaults to "id".
          Hide
          Shawn Heisey added a comment -

          I've been looking at recent updates to the ref guide via the commits alias. One of the more recent pages to receive updates is the one about atomic updates and optimistic concurrency. It occurred to me that ConcurrentUpdateSolrServer can't be used at all with optimistic concurrency because it won't ever fail, even if the update fails because of a version conflict.

          On topic for this issue: Could optimistic concurrency be used effectively if CloudSolrServer is in parallel thread mode? I've been assured that it won't swallow exceptions, but would the exception be associated with the correct request? Exactly how parallel threads works in this update is a mystery to me.

          I'm only asking so the documentation can be correct, not to achieve different behavior.

          Show
          Shawn Heisey added a comment - I've been looking at recent updates to the ref guide via the commits alias. One of the more recent pages to receive updates is the one about atomic updates and optimistic concurrency. It occurred to me that ConcurrentUpdateSolrServer can't be used at all with optimistic concurrency because it won't ever fail, even if the update fails because of a version conflict. On topic for this issue: Could optimistic concurrency be used effectively if CloudSolrServer is in parallel thread mode? I've been assured that it won't swallow exceptions, but would the exception be associated with the correct request? Exactly how parallel threads works in this update is a mystery to me. I'm only asking so the documentation can be correct, not to achieve different behavior.
          Hide
          Joel Bernstein added a comment -

          You would get an exception that you could trace to one of the servers. The exception itself would have to have the info needed to determine what document failed due to optimistic locking.

          Show
          Joel Bernstein added a comment - You would get an exception that you could trace to one of the servers. The exception itself would have to have the info needed to determine what document failed due to optimistic locking.
          Hide
          ASF subversion and git services added a comment -

          Commit 1527131 from Steve Rowe in branch 'dev/branches/lucene_solr_4_5'
          [ https://svn.apache.org/r1527131 ]

          SOLR-4816: add missing CHANGES credit (merged branch_4x r1524177)

          Show
          ASF subversion and git services added a comment - Commit 1527131 from Steve Rowe in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1527131 ] SOLR-4816 : add missing CHANGES credit (merged branch_4x r1524177)
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          You would get an exception that you could trace to one of the servers. The exception itself would have to have the info needed to determine what document failed due to optimistic locking.

          Joel Bernstein - I'm a little confused about how to track failures with CloudSolrServer for update requests. Firstly, the RouteException class is private to CloudSolrServer and cannot be used at all. Secondly, since the responseFutures are per URL, won't two update requests on the same server overwrite the entries?

          Show
          Shalin Shekhar Mangar added a comment - - edited You would get an exception that you could trace to one of the servers. The exception itself would have to have the info needed to determine what document failed due to optimistic locking. Joel Bernstein - I'm a little confused about how to track failures with CloudSolrServer for update requests. Firstly, the RouteException class is private to CloudSolrServer and cannot be used at all. Secondly, since the responseFutures are per URL, won't two update requests on the same server overwrite the entries?
          Hide
          Joel Bernstein added a comment -

          There is a test case that uses the RouteResponse which has the same access level as RouteException, which works. It may be that this is because of the friendly package access. I'll test both RouteReponse and RouteException outside the package and see if they are accessible.

          Secondly, since the responseFutures are per URL, won't two update requests on the same server overwrite the entries?

          When an exception is thrown from one of the routed servers the batch will terminate on that server. So each URL in RouteException should have only one exception.

          Show
          Joel Bernstein added a comment - There is a test case that uses the RouteResponse which has the same access level as RouteException, which works. It may be that this is because of the friendly package access. I'll test both RouteReponse and RouteException outside the package and see if they are accessible. Secondly, since the responseFutures are per URL, won't two update requests on the same server overwrite the entries? When an exception is thrown from one of the routed servers the batch will terminate on that server. So each URL in RouteException should have only one exception.
          Hide
          Mark Miller added a comment -

          Let's make a new issue to make these public in 4.6. The tests can access them because they are package private - they should be public and static.

          Show
          Mark Miller added a comment - Let's make a new issue to make these public in 4.6. The tests can access them because they are package private - they should be public and static.
          Hide
          Joel Bernstein added a comment -

          I've been mulling over how big an issue this is. If you get a number of exceptions from different servers now, you'd only see one. You'd fix the data and refeed the batch. Then you'd see the next exception and you'd have to fix that. That's no different then how you'd have to work if you had multiple exceptions in a batch on single server. So, I don't think it's such large issue. I agree with Mark's plan.

          Show
          Joel Bernstein added a comment - I've been mulling over how big an issue this is. If you get a number of exceptions from different servers now, you'd only see one. You'd fix the data and refeed the batch. Then you'd see the next exception and you'd have to fix that. That's no different then how you'd have to work if you had multiple exceptions in a batch on single server. So, I don't think it's such large issue. I agree with Mark's plan.
          Hide
          Joel Bernstein added a comment -

          I've been doing some more testing of the error handling. If CloudSolrServer encounters an error from any of it's shards it gathers them up into a RouteException and throws. This exception extends SolrException and can be treated as such.

          When getMessage is called, you see a typical update error returned:

          ERROR: [doc=1100] Error adding field 'test_i'='bad' msg=For input string: "bad"

          What we don't know is which server to check to get the full stack trace. But it doesn't seem like we knew that info in CloudSolrServer prior to this patch. RouteException, when made public, will tell us this.

          Show
          Joel Bernstein added a comment - I've been doing some more testing of the error handling. If CloudSolrServer encounters an error from any of it's shards it gathers them up into a RouteException and throws. This exception extends SolrException and can be treated as such. When getMessage is called, you see a typical update error returned: ERROR: [doc=1100] Error adding field 'test_i'='bad' msg=For input string: "bad" What we don't know is which server to check to get the full stack trace. But it doesn't seem like we knew that info in CloudSolrServer prior to this patch. RouteException, when made public, will tell us this.
          Hide
          Shalin Shekhar Mangar added a comment -

          bq, When an exception is thrown from one of the routed servers the batch will terminate on that server. So each URL in RouteException should have only one exception.

          Great, thanks for clarifying that.

          bq, Let's make a new issue to make these public in 4.6.

          +1

          Show
          Shalin Shekhar Mangar added a comment - bq, When an exception is thrown from one of the routed servers the batch will terminate on that server. So each URL in RouteException should have only one exception. Great, thanks for clarifying that. bq, Let's make a new issue to make these public in 4.6. +1
          Hide
          Erick Erickson added a comment -

          Hey Joel! Now that you have karma, you can assign this to yourself if you want and Mark doesn't object!

          Show
          Erick Erickson added a comment - Hey Joel! Now that you have karma, you can assign this to yourself if you want and Mark doesn't object!
          Hide
          Mark Miller added a comment -

          This should be resolved as fixed for 4.5. Any further work should be done in new issues. I'd do it now but I'm on mobile.

          Show
          Mark Miller added a comment - This should be resolved as fixed for 4.5. Any further work should be done in new issues. I'd do it now but I'm on mobile.
          Hide
          Erick Erickson added a comment -

          We missed marking this as fixed, it went out in 4.5 as per Mark Miller

          Show
          Erick Erickson added a comment - We missed marking this as fixed, it went out in 4.5 as per Mark Miller
          Hide
          Mark Miller added a comment -

          I had left this open because some jenkins fails started when it was committed. I addressed a couple of them already though - if anything remains it can get it's own issue for a future release.

          Thanks a lot for sticking it out so well on this one Joel!

          Show
          Mark Miller added a comment - I had left this open because some jenkins fails started when it was committed. I addressed a couple of them already though - if anything remains it can get it's own issue for a future release. Thanks a lot for sticking it out so well on this one Joel!
          Hide
          Joel Bernstein added a comment -

          No problem, and thanks for all your help on this ticket.

          Show
          Joel Bernstein added a comment - No problem, and thanks for all your help on this ticket.
          Hide
          Jessica Cheng added a comment -

          I think the latest patch:

          • if (request instanceof IsUpdateRequest && updatesToLeaders) {
            + if (request instanceof IsUpdateRequest) {

          removed the effect of the "updatesToLeaders" variable. Looking at http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup it's not used anywhere to make a decision anymore.

          Show
          Jessica Cheng added a comment - I think the latest patch: if (request instanceof IsUpdateRequest && updatesToLeaders) { + if (request instanceof IsUpdateRequest) { removed the effect of the "updatesToLeaders" variable. Looking at http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup it's not used anywhere to make a decision anymore.
          Hide
          Mark Miller added a comment -

          FYI SOLR-5899: CloudSolrServer's RouteResponse and RouteException should be publicly accessible.

          Show
          Mark Miller added a comment - FYI SOLR-5899 : CloudSolrServer's RouteResponse and RouteException should be publicly accessible.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Joel Bernstein
            • Votes:
              5 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development