Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, Trunk
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously

      add an extra param async=true for all collection commands

      the task is written to ZK and the caller is returned a task id.
      as separate collection admin command will be added to poll the status of the task

      command=status&id=7657668909

      if id is not passed all running async tasks should be listed

      A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads

      1. SOLR-5477-CoreAdminStatus.patch
        10 kB
        Anshum Gupta
      2. SOLR-5477.patch
        33 kB
        Anshum Gupta
      3. SOLR-5477.patch
        33 kB
        Shalin Shekhar Mangar
      4. SOLR-5477.patch
        44 kB
        Anshum Gupta
      5. SOLR-5477.patch
        57 kB
        Anshum Gupta
      6. SOLR-5477.patch
        71 kB
        Anshum Gupta
      7. SOLR-5477.patch
        63 kB
        Anshum Gupta
      8. SOLR-5477.patch
        66 kB
        Anshum Gupta
      9. SOLR-5477.patch
        60 kB
        Anshum Gupta
      10. SOLR-5477.patch
        63 kB
        Anshum Gupta
      11. SOLR-5477.patch
        63 kB
        Anshum Gupta
      12. SOLR-5477.patch
        63 kB
        Anshum Gupta
      13. SOLR-5477.patch
        63 kB
        Anshum Gupta
      14. SOLR-5477.patch
        77 kB
        Anshum Gupta
      15. SOLR-5477.patch
        78 kB
        Anshum Gupta
      16. SOLR-5477.patch
        80 kB
        Anshum Gupta
      17. SOLR-5477.patch
        79 kB
        Anshum Gupta
      18. SOLR-5477.patch
        79 kB
        Anshum Gupta
      19. SOLR-5477.patch
        79 kB
        Anshum Gupta
      20. SOLR-5477-updated.patch
        78 kB
        Anshum Gupta
      21. SOLR-5477.patch
        79 kB
        Shalin Shekhar Mangar
      22. SOLR-5477.patch
        77 kB
        Anshum Gupta
      23. SOLR-5477.patch
        78 kB
        Anshum Gupta
      24. SOLR-5477.patch
        97 kB
        Anshum Gupta
      25. SOLR-5477.urlschemefix.patch
        1 kB
        Anshum Gupta

        Issue Links

          Activity

          Hide
          Yago Riveiro added a comment -

          Related with this feature we can add a notification panel in the UI.

          Show
          Yago Riveiro added a comment - Related with this feature we can add a notification panel in the UI.
          Hide
          Anshum Gupta added a comment -

          Here's what I'd recommend.

          Have 3 queues in the first phase of implementation. One each for submitted, running, completed. The completed queue only keeps the top-X tasks (by recency of completion). The completion queue is important for people to figure out details about a completed task e.g. completion time, running time etc.

          I've started working on it and would recommend that we have a ThreadPool for the running tasks. This can be capped at a config setting.

          I am still debating about when to accept tasks (or perhaps accept everything and fail them when they run). Here's a sample case on that. Firing a Shard split for collection1/shard1 would lead to an inactive shard1. If we continue to accept tasks until this completes, we may accept actions that involve shard1. We may need to take a call on that.

          For now, I am not looking at truly multi-threading my implementation (but certainly doing that before having this particular JIRA as resolved). Once I get to it, I'd perhaps still just run only one request per collection at a time, until we have a more complex decision making capability.

          Once a task is submitted, the OverseerCollectionProcessor peeks and processes tasks which are in the submitted queue and moves them to in-process. We'll have to synchronize this task on the queue/collection.

          Upon completion, again the task is moved from the in-progress queue to the completed queue.

          Cleaning up of the completed queue could also be tricky and we may need a failed tasks queue or have a way to perhaps retain failed tasks in the completed queue longer.

          Show
          Anshum Gupta added a comment - Here's what I'd recommend. Have 3 queues in the first phase of implementation. One each for submitted, running, completed. The completed queue only keeps the top-X tasks (by recency of completion). The completion queue is important for people to figure out details about a completed task e.g. completion time, running time etc. I've started working on it and would recommend that we have a ThreadPool for the running tasks. This can be capped at a config setting. I am still debating about when to accept tasks (or perhaps accept everything and fail them when they run). Here's a sample case on that. Firing a Shard split for collection1/shard1 would lead to an inactive shard1. If we continue to accept tasks until this completes, we may accept actions that involve shard1. We may need to take a call on that. For now, I am not looking at truly multi-threading my implementation (but certainly doing that before having this particular JIRA as resolved). Once I get to it, I'd perhaps still just run only one request per collection at a time, until we have a more complex decision making capability. Once a task is submitted, the OverseerCollectionProcessor peeks and processes tasks which are in the submitted queue and moves them to in-process. We'll have to synchronize this task on the queue/collection. Upon completion, again the task is moved from the in-progress queue to the completed queue. Cleaning up of the completed queue could also be tricky and we may need a failed tasks queue or have a way to perhaps retain failed tasks in the completed queue longer.
          Hide
          Anshum Gupta added a comment -

          +1 for Yago Riveiro

          Show
          Anshum Gupta added a comment - +1 for Yago Riveiro
          Hide
          Jessica Cheng Mallet added a comment -

          Would you please comment more about how failures are handled? I'm interested especially in how the Overseer may find out if individual subcommand to individual core admin API failed vs. timed out (the same sort of problem on the overall collection task). Also, if it fails, if and how states are cleaned up so that when the client re-issue the command it has a chance of succeeding. (For example, if a split shard command fails for some unknown reason, it might have left the shardX_0 and shardX_1 created, and the next split command might fail because it tries to create those two new target shards but they already exist. Note that this is just an example for explanation's sake--I don't know if it actually will complain this way.) Thanks!

          Show
          Jessica Cheng Mallet added a comment - Would you please comment more about how failures are handled? I'm interested especially in how the Overseer may find out if individual subcommand to individual core admin API failed vs. timed out (the same sort of problem on the overall collection task). Also, if it fails, if and how states are cleaned up so that when the client re-issue the command it has a chance of succeeding. (For example, if a split shard command fails for some unknown reason, it might have left the shardX_0 and shardX_1 created, and the next split command might fail because it tries to create those two new target shards but they already exist. Note that this is just an example for explanation's sake--I don't know if it actually will complain this way.) Thanks!
          Hide
          Anshum Gupta added a comment -

          I am not really planning at adding any new checks for the moment but any exception in the response can be checked for. In case the response contains an exception, the task would be removed from the work queue and put into the failure queue.
          Retries would work for all commands that handle retries gracefully (split shard for instance does).

          Going forward, we can always enhance this to be smarter as far as handling failures is concerned.

          Show
          Anshum Gupta added a comment - I am not really planning at adding any new checks for the moment but any exception in the response can be checked for. In case the response contains an exception, the task would be removed from the work queue and put into the failure queue. Retries would work for all commands that handle retries gracefully (split shard for instance does). Going forward, we can always enhance this to be smarter as far as handling failures is concerned.
          Hide
          Noble Paul added a comment -

          auto-retry is not a good feature , I would say. The fact that, a command failed to execute means that there was something keeping it from running . ( A timeout would not happen in a background operation).So it might fail again. It is wise to capture the error in the 'failed' queue and present the information to the admin user . The user can rectify the problem and re-issue a command

          Show
          Noble Paul added a comment - auto-retry is not a good feature , I would say. The fact that, a command failed to execute means that there was something keeping it from running . ( A timeout would not happen in a background operation).So it might fail again. It is wise to capture the error in the 'failed' queue and present the information to the admin user . The user can rectify the problem and re-issue a command
          Hide
          Jessica Cheng Mallet added a comment -

          I agree that auto-retry is not the right thing to do.

          However, a timeout can possibly happen on the Overseer to node admin requests (if these requests have no timeouts, it might be dangerous because a connection can be sometimes be sunk and the client will never find out--we've actually seen this happen on the apache httpclient through solrj). What I'm getting at is that for the same reason that we're changing this client to Overseer request to being async, maybe the Overseer to node admin request should be async too.

          Show
          Jessica Cheng Mallet added a comment - I agree that auto-retry is not the right thing to do. However, a timeout can possibly happen on the Overseer to node admin requests (if these requests have no timeouts, it might be dangerous because a connection can be sometimes be sunk and the client will never find out--we've actually seen this happen on the apache httpclient through solrj). What I'm getting at is that for the same reason that we're changing this client to Overseer request to being async, maybe the Overseer to node admin request should be async too.
          Hide
          Anshum Gupta added a comment -

          Thanks for the input Jessica.

          I would like to believe that most of the internal timeouts (from the Overseer to Core Admin) would already be taken care of. If that's not the case, I'll handle it internally and add timeouts. Having said that, I wouldn't propose to self-retry or anything even in that case.
          I would instead want to let it timeout, report the same back to the user and let him reconfigure the timeout (would add a param that let's a user override the default timeouts for the particular request).
          This way, we enable the user to make the 'intelligent' decisions and overcome timeout (and other) issues.

          When you poll for the status of a request, the response would contain the failure information (in case of failure) which can be used by the admin user to take a call.

          Show
          Anshum Gupta added a comment - Thanks for the input Jessica. I would like to believe that most of the internal timeouts (from the Overseer to Core Admin) would already be taken care of. If that's not the case, I'll handle it internally and add timeouts. Having said that, I wouldn't propose to self-retry or anything even in that case. I would instead want to let it timeout, report the same back to the user and let him reconfigure the timeout (would add a param that let's a user override the default timeouts for the particular request). This way, we enable the user to make the 'intelligent' decisions and overcome timeout (and other) issues. When you poll for the status of a request, the response would contain the failure information (in case of failure) which can be used by the admin user to take a call.
          Hide
          Jessica Cheng Mallet added a comment -

          Again, I'm not advocating an auto-retry and agree that it is not the right thing to do.

          All I'm saying is that as an admin user, I would like to have the definitive answer of "success", "failure", and "in progress", and if a request times out anywhere down in the pipeline, the answer is "don't know" because it can either be "failure" or "in progress". Without a way to separately/asynchronously poll the status of any individual subtask, the overall collection request cannot offer this definitive answer, and I as an admin user will not be able to make the call of whether or not to re-issue a request. That is, if my failure status is "time out", I have no idea what actually went wrong, what to fix, or even if actually my timed out request is still in progress and will finish in another 10 seconds.

          Show
          Jessica Cheng Mallet added a comment - Again, I'm not advocating an auto-retry and agree that it is not the right thing to do. All I'm saying is that as an admin user, I would like to have the definitive answer of "success", "failure", and "in progress", and if a request times out anywhere down in the pipeline, the answer is "don't know" because it can either be "failure" or "in progress". Without a way to separately/asynchronously poll the status of any individual subtask, the overall collection request cannot offer this definitive answer, and I as an admin user will not be able to make the call of whether or not to re-issue a request. That is, if my failure status is "time out", I have no idea what actually went wrong, what to fix, or even if actually my timed out request is still in progress and will finish in another 10 seconds.
          Hide
          Mark Miller added a comment -

          On a related note, I think our current timeouts are much too low at one minute. If nothing comes up to make us think the call is still not in progress, we should be willing to wait much longer I think.

          Show
          Mark Miller added a comment - On a related note, I think our current timeouts are much too low at one minute. If nothing comes up to make us think the call is still not in progress, we should be willing to wait much longer I think.
          Hide
          Anshum Gupta added a comment -

          I think most calls (or perhaps all) are idempotent/safe. If something times-out, we can safely put it as a failed task with a message specifying an internal timeout for now and have the admin user reissue the call with increased timeout param.

          As far as I remember most of the calls are built to check and cleanup an older call that failed.

          Also, I agree that our time-out values right now are too low for a lot of operations to fail (specially at larger scales).

          Show
          Anshum Gupta added a comment - I think most calls (or perhaps all) are idempotent/safe. If something times-out, we can safely put it as a failed task with a message specifying an internal timeout for now and have the admin user reissue the call with increased timeout param. As far as I remember most of the calls are built to check and cleanup an older call that failed. Also, I agree that our time-out values right now are too low for a lot of operations to fail (specially at larger scales).
          Hide
          Anshum Gupta added a comment -

          Having thought about it, here's another solution that also makes CoreAdmin calls async.

          Adding a CoreAdmin API that also works on similar lines as that of Collection API for fetching back the request status. This will enable async Core Admin calls and avoid potential timeouts between the overseer and the cores.

          Here's what all of the solution will look like:

          • Async CollectionAPI calls for the end user which uses zk.
          • Request status API for Collection level.
          • CoreAdmin API to return request id as part of the response (if run in async mode?).
          • CoreAdmin API to get the status of a submitted task (should solve the problem that you highlighted in particular).

          I'm still debating between having even the CoreAdmin to use zk (which means it'd only work in SolrCloud mode) or just have a local map of running taks. I'm anyways moving on with making the calls 'to' overseer async.

          Show
          Anshum Gupta added a comment - Having thought about it, here's another solution that also makes CoreAdmin calls async. Adding a CoreAdmin API that also works on similar lines as that of Collection API for fetching back the request status. This will enable async Core Admin calls and avoid potential timeouts between the overseer and the cores. Here's what all of the solution will look like: Async CollectionAPI calls for the end user which uses zk. Request status API for Collection level. CoreAdmin API to return request id as part of the response (if run in async mode?). CoreAdmin API to get the status of a submitted task (should solve the problem that you highlighted in particular). I'm still debating between having even the CoreAdmin to use zk (which means it'd only work in SolrCloud mode) or just have a local map of running taks. I'm anyways moving on with making the calls 'to' overseer async.
          Hide
          Anshum Gupta added a comment -

          First raw patch with a test that starts tracking completed and in-progress tasks at the CoreAdmin level.
          It's nowhere close to being complete on this front and have a few TODOs in there.

          I have another patch (will upload in a while) for the CollectionAPI that uses zk to track information at collection level.

          Once I have both of them, will have them work together.

          Show
          Anshum Gupta added a comment - First raw patch with a test that starts tracking completed and in-progress tasks at the CoreAdmin level. It's nowhere close to being complete on this front and have a few TODOs in there. I have another patch (will upload in a while) for the CollectionAPI that uses zk to track information at collection level. Once I have both of them, will have them work together.
          Hide
          Hoss Man added a comment -

          A few small suggestions from someone who hasn't through much of this but has done similar async setups in other systems in another lifetime...

          1) on where the (core task) queues should live...

          I'm still debating between having even the CoreAdmin to use zk (which means it'd only work in SolrCloud mode) or just have a local map of running taks.

          I think it would be wise to keep them in ZK – if for no other reason then because the primary usecase you expect is for the async core calls to be made by the async overseer calls; and by keeping the async core queues in zk, the overseer can watch those queues directly for "completed" instead of needing ot wake up, poll every replica, go back to sleep.

          However, a secondary concern (i think) is what should happen if/when a node gets rebooted – if the core admin tasks queues are in RAM then you could easily get in a situation where the overseer asks 10 replicas to do something, replicaA succeeds or fails quickly and then reboots, the overseer checks back once all replicas are done and finds that replicaA can't say one way or another whether it succeeded or failed – it's queues are totally empty.

          2) on generating the task/request IDs.

          in my experience, when implementing an async callback API like this, it can be handy to require the client to specify the magical id that you use to keep track of things – you just ensure it's unique among the existing async jobs you know about (either in the queue, or in the recently completed/failed queues). Sometimes single threaded (or centrally manged) client apps can generate a unique id easier then your distributed system, and/or they may already have a one-to-one mapping between some id they've already got and the task they are asking you to do, and re-using that id makes the client's life easier for debuging/audit-logs.

          in the case of async collection commands -> async core commands, it would also mean the overseer could reuse whatever id the client passed in for the collection commands when talking to each of the replicas.

          Show
          Hoss Man added a comment - A few small suggestions from someone who hasn't through much of this but has done similar async setups in other systems in another lifetime... 1) on where the (core task) queues should live... I'm still debating between having even the CoreAdmin to use zk (which means it'd only work in SolrCloud mode) or just have a local map of running taks. I think it would be wise to keep them in ZK – if for no other reason then because the primary usecase you expect is for the async core calls to be made by the async overseer calls; and by keeping the async core queues in zk, the overseer can watch those queues directly for "completed" instead of needing ot wake up, poll every replica, go back to sleep. However, a secondary concern (i think) is what should happen if/when a node gets rebooted – if the core admin tasks queues are in RAM then you could easily get in a situation where the overseer asks 10 replicas to do something, replicaA succeeds or fails quickly and then reboots, the overseer checks back once all replicas are done and finds that replicaA can't say one way or another whether it succeeded or failed – it's queues are totally empty. 2) on generating the task/request IDs. in my experience, when implementing an async callback API like this, it can be handy to require the client to specify the magical id that you use to keep track of things – you just ensure it's unique among the existing async jobs you know about (either in the queue, or in the recently completed/failed queues). Sometimes single threaded (or centrally manged) client apps can generate a unique id easier then your distributed system, and/or they may already have a one-to-one mapping between some id they've already got and the task they are asking you to do, and re-using that id makes the client's life easier for debuging/audit-logs. in the case of async collection commands -> async core commands, it would also mean the overseer could reuse whatever id the client passed in for the collection commands when talking to each of the replicas.
          Hide
          Anshum Gupta added a comment -

          Thanks for the inputs Hoss.

          Having them in zk certainly makes it easier to track completed tasks but I was just trying to
          1. Keep it simpler for now
          2. Have no dependency on zk for CoreAdmin tasks.

          Pt. 2 that I mentioned however takes a back seat if we don't intend to use it for purely core admin calls from other clients.

          About generating request IDs at client-side, I'll ping you on IRC and get more clarity on that.

          Show
          Anshum Gupta added a comment - Thanks for the inputs Hoss. Having them in zk certainly makes it easier to track completed tasks but I was just trying to 1. Keep it simpler for now 2. Have no dependency on zk for CoreAdmin tasks. Pt. 2 that I mentioned however takes a back seat if we don't intend to use it for purely core admin calls from other clients. About generating request IDs at client-side, I'll ping you on IRC and get more clarity on that.
          Hide
          Anshum Gupta added a comment -

          in my experience, when implementing an async callback API like this, it can be handy to require the client to specify the magical...

          Considering that we have a 1-n relationship between calls made by the client to the OCP and OCP to Cores, we can't really use the client generated id. We would anyways need multiple ids be generated at the OCP-Core call level.

          Show
          Anshum Gupta added a comment - in my experience, when implementing an async callback API like this, it can be handy to require the client to specify the magical... Considering that we have a 1-n relationship between calls made by the client to the OCP and OCP to Cores, we can't really use the client generated id. We would anyways need multiple ids be generated at the OCP-Core call level.
          Hide
          Anshum Gupta added a comment - - edited

          Patch with more meat. Here's what this includes:

          • Request Status API based on ZK for Collection level calls.
          • Request Status API for CoreAdmin level (still memory based in this patch).
          • Async mode. Specifying 'async=requestid' let's run splitshard and createcollection commands for now in async mode. This request id can then be used to track the progress of the request. If run in async mode, the call returns almost immediately, only pushing the task to the zk queue.
          • Basic tests for the above.

          Here's what's lying semi baked on my machine, will continue pushing more of this stuff over the next few days:

          • Check and dedup the request id as specified by the user.
          • CoreAdmin calls to be async from OCP in case the original call was initiated with 'async' option.
          • Return more than just the status of the request as part of the status request API. Will be returning (atleast) the entire Response instead.
          • Make CoreAdmin calls optionally async.
          Show
          Anshum Gupta added a comment - - edited Patch with more meat. Here's what this includes: Request Status API based on ZK for Collection level calls. Request Status API for CoreAdmin level (still memory based in this patch). Async mode. Specifying 'async=requestid' let's run splitshard and createcollection commands for now in async mode. This request id can then be used to track the progress of the request. If run in async mode, the call returns almost immediately, only pushing the task to the zk queue. Basic tests for the above. Here's what's lying semi baked on my machine, will continue pushing more of this stuff over the next few days: Check and dedup the request id as specified by the user. CoreAdmin calls to be async from OCP in case the original call was initiated with 'async' option. Return more than just the status of the request as part of the status request API. Will be returning (atleast) the entire Response instead. Make CoreAdmin calls optionally async.
          Hide
          Anshum Gupta added a comment -

          More todo:

          • Cleanup the task queues. Another API call? Timed cleanup? Size limiting?
          Show
          Anshum Gupta added a comment - More todo: Cleanup the task queues. Another API call? Timed cleanup? Size limiting?
          Hide
          Anshum Gupta added a comment -

          One change that I'm almost done with is 'not managing the request id generation'. The client is supposed to generate/manage the id.
          The overseer would use the id sent to it as a prefix for the ids it generates for the CoreAdmin calls.

          Show
          Anshum Gupta added a comment - One change that I'm almost done with is 'not managing the request id generation'. The client is supposed to generate/manage the id. The overseer would use the id sent to it as a prefix for the ids it generates for the CoreAdmin calls.
          Hide
          Anshum Gupta added a comment - - edited

          I have a few questions regrading my approach for making the CoreAdmin calls async:

          Approach #1:

          • CoreAdmin requests get submitted to zk.
          • Core watches it's zk node for submitted tasks. Request object is the data in the node (when submitted).
          • On completion, the core deletes the submitted task and puts a new node with the response and other metadata into zk.
          • Collection API watches the node when it submits a task, waits for it to complete.
          • On completion of the Collection API call, delete all related core admin request nodes in zk that were generated.
          • Cleaning up of request nodes in zk happens through an explicit API call.
          • Having something on the following lines in zk would be helpful:

          /tasks
          ./collections/collection1/task1
          ./cores/core1/collection1/task1/coretask1
          ./_

          This would help us delete the entire group of tasks associated to a core/collection/core task/collection task.

          Questions:

          • This move would mean having a lot more clients talk to and write to zk. Does this approach make sense as far as the intended direction of SolrCloud is concerned?
          • Any suggestions/concerns about scalability of zk as far as having multiple updates coming into zk is concerned.

          Approach #2:
          Continue accepting the request like right now, but just :

          1. Get the call to return immediately
          2. Use zk to only track/store the status (persistence). The request status calls still comes to the core and the status is fetched from zk by the core instead of the client being intelligent and talking directly to zk.

          This approach is certainly less intrusive but then also doesn't come with the benefit of having the client just watch over a particular zk node for task state change etc.

          Approach #3 (Not the best option, and more like the option if zk has scalability issues with everyone writing/watching):

          • Not have CoreAdmin calls as async but instead introduce a tracking mode. Once the task is submitted [with async = "taskid"], track this request using an in-memory data structure. Even if the request times out, the client can go back and query about the task status.
          Show
          Anshum Gupta added a comment - - edited I have a few questions regrading my approach for making the CoreAdmin calls async: Approach #1: CoreAdmin requests get submitted to zk. Core watches it's zk node for submitted tasks. Request object is the data in the node (when submitted). On completion, the core deletes the submitted task and puts a new node with the response and other metadata into zk. Collection API watches the node when it submits a task, waits for it to complete. On completion of the Collection API call, delete all related core admin request nodes in zk that were generated. Cleaning up of request nodes in zk happens through an explicit API call. Having something on the following lines in zk would be helpful: /tasks ./collections/collection1/task1 ./cores/core1/collection1/task1/coretask1 ./_ This would help us delete the entire group of tasks associated to a core/collection/core task/collection task. Questions: This move would mean having a lot more clients talk to and write to zk. Does this approach make sense as far as the intended direction of SolrCloud is concerned? Any suggestions/concerns about scalability of zk as far as having multiple updates coming into zk is concerned. Approach #2: Continue accepting the request like right now, but just : Get the call to return immediately Use zk to only track/store the status (persistence). The request status calls still comes to the core and the status is fetched from zk by the core instead of the client being intelligent and talking directly to zk. This approach is certainly less intrusive but then also doesn't come with the benefit of having the client just watch over a particular zk node for task state change etc. Approach #3 (Not the best option, and more like the option if zk has scalability issues with everyone writing/watching): Not have CoreAdmin calls as async but instead introduce a tracking mode. Once the task is submitted [with async = "taskid"] , track this request using an in-memory data structure. Even if the request times out, the client can go back and query about the task status.
          Hide
          Anshum Gupta added a comment -

          Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as much as possible).".
          Noble Paul , any suggestions on that?

          Show
          Anshum Gupta added a comment - Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as much as possible).". Noble Paul , any suggestions on that?
          Hide
          Anshum Gupta added a comment -

          Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as much as possible).".
          Noble Paul , any suggestions on that?

          Show
          Anshum Gupta added a comment - Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as much as possible).". Noble Paul , any suggestions on that?
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Questions:
          This move would mean having a lot more clients talk to and write to zk. Does this approach make sense as far as the intended direction of SolrCloud is concerned?
          Any suggestions/concerns about scalability of zk as far as having multiple updates coming into zk is concerned.

          1. I don't think we need to use a ZooKeeper queue for communication. The only reason where that is required is when we need fail over. That is desired for the overseer actions but not required for core admin actions. For example, I don't think you should be able to submit a core admin action when the target node is not up.
          2. This approach only works with a cloud aware client such as SolrJ i.e. you can't submit an async core admin action with HTTP. This in itself is a strong reason to warrant a -1 from me.
          3. This a very intrusive change compared to other approaches. I think the benefits you have outlined can be achieved in simpler ways (see my comments on approach #2 below)

          Approach #2:
          Continue accepting the request like right now, but just :
          Get the call to return immediately
          Use zk to only track/store the status (persistence). The request status calls still comes to the core and the status is fetched from zk by the core instead of the client being intelligent and talking directly to zk.

          This is much more acceptable to me. Clients should not have to worry about ZK to submit a request.

          This approach is certainly less intrusive but then also doesn't come with the benefit of having the client just watch over a particular zk node for task state change etc.

          In my mind, the proposal to have the client watch nodes for task state change is orthogonal to the way the task is invoked. There is no reason why the request can't be made via HTTP and the response from core container can't contain the request id (is this the complete zookeeper node name?) which can be used by a client to setup a watch directly if it so desires. In any case a HTTP based status/polling API must exist.

          Show
          Shalin Shekhar Mangar added a comment - - edited Questions: This move would mean having a lot more clients talk to and write to zk. Does this approach make sense as far as the intended direction of SolrCloud is concerned? Any suggestions/concerns about scalability of zk as far as having multiple updates coming into zk is concerned. I don't think we need to use a ZooKeeper queue for communication. The only reason where that is required is when we need fail over. That is desired for the overseer actions but not required for core admin actions. For example, I don't think you should be able to submit a core admin action when the target node is not up. This approach only works with a cloud aware client such as SolrJ i.e. you can't submit an async core admin action with HTTP. This in itself is a strong reason to warrant a -1 from me. This a very intrusive change compared to other approaches. I think the benefits you have outlined can be achieved in simpler ways (see my comments on approach #2 below) Approach #2: Continue accepting the request like right now, but just : Get the call to return immediately Use zk to only track/store the status (persistence). The request status calls still comes to the core and the status is fetched from zk by the core instead of the client being intelligent and talking directly to zk. This is much more acceptable to me. Clients should not have to worry about ZK to submit a request. This approach is certainly less intrusive but then also doesn't come with the benefit of having the client just watch over a particular zk node for task state change etc. In my mind, the proposal to have the client watch nodes for task state change is orthogonal to the way the task is invoked. There is no reason why the request can't be made via HTTP and the response from core container can't contain the request id (is this the complete zookeeper node name?) which can be used by a client to setup a watch directly if it so desires. In any case a HTTP based status/polling API must exist.
          Hide
          Noble Paul added a comment -

          Other than Overseer no other nodes watch queues. A core watching a queue will be a huge scalability issue

          Show
          Noble Paul added a comment - Other than Overseer no other nodes watch queues. A core watching a queue will be a huge scalability issue
          Hide
          Mark Miller added a comment -

          Given the numbers published for ZooKeeper, even with 10,000 cores all watching, I highly doubt it would be a huge scalability issue to have them all watching a queue.

          Approach #2 still looks like the right approach though. Nice and simple.

          Show
          Mark Miller added a comment - Given the numbers published for ZooKeeper, even with 10,000 cores all watching, I highly doubt it would be a huge scalability issue to have them all watching a queue. Approach #2 still looks like the right approach though. Nice and simple.
          Hide
          Anshum Gupta added a comment -

          Thanks all of you. I'll go with approach#2 in that case.

          I think more than zk being a scalability bottleneck, not wanting to delegate all of the communication etc. to zk could be a stronger reason to not take approach#1.

          Show
          Anshum Gupta added a comment - Thanks all of you. I'll go with approach#2 in that case. I think more than zk being a scalability bottleneck, not wanting to delegate all of the communication etc. to zk could be a stronger reason to not take approach#1.
          Hide
          Shalin Shekhar Mangar added a comment -

          Some comments on the last patch:

          1. Why have two different sets of parameter names; 'requeststatus' with 'id' for collection handler and 'reqstatus' with 'requestid' for OverseerCollectionProcessor? Let's make it the same. Maybe just call it 'status' with a request_id?
          2. In CollectionHandler.handleSplitShardAction, the following can throw NPE if ASYNC is not present. Use Boolean.parseBoolean instead:
            if (req.getParams().get(ASYNC).equals("true")) {
                  props.put(ASYNC, "true");
                }
            
          3. Shouldn't the CollectionHandler should add support for async in all commands and not just create collection and split?
          4. CollectionHandler.handleResponse should actually check for non-null value of "async" Just checking if the key exists is not enough.
          5. I guess the client creating/managing the id part is not implemented in the last patch.
          6. In DistributedQueue.createData – there is debug logging that should be marked with nocommit and removed
          7. DistributedQueue.getNumericId is an unused method
          8. DistributedQueue.containsValue creates a new String from byte[] with the default charset. Similar issues with String.getBytes. Run ant check-forbidden-apis from inside the solr folder for a full list.
          9. The completed queue and the failure queue are actually for OverseerCollectionProcessor so they should be named appropriately i.e. /overseer/collection-queue-completed and /overseer/collection-queue-failed
          10. Why are items being put in the completedQueue in Overseer? The async and status support is only for collection commands not overseer commands.
          11. The OverseerCollectionProcessor.processMessage catches Throwables with this patch. It should be changed back to catching Exception only.
          12. Processing OverseerCollectionProcessor tasks in multiple threads is a key requirement because these tasks can be long running. Just making them asynchronous is not enough. This hasn't been addressed in this patch yet.
          Show
          Shalin Shekhar Mangar added a comment - Some comments on the last patch: Why have two different sets of parameter names; 'requeststatus' with 'id' for collection handler and 'reqstatus' with 'requestid' for OverseerCollectionProcessor? Let's make it the same. Maybe just call it 'status' with a request_id? In CollectionHandler.handleSplitShardAction, the following can throw NPE if ASYNC is not present. Use Boolean.parseBoolean instead: if (req.getParams().get(ASYNC).equals( " true " )) { props.put(ASYNC, " true " ); } Shouldn't the CollectionHandler should add support for async in all commands and not just create collection and split? CollectionHandler.handleResponse should actually check for non-null value of "async" Just checking if the key exists is not enough. I guess the client creating/managing the id part is not implemented in the last patch. In DistributedQueue.createData – there is debug logging that should be marked with nocommit and removed DistributedQueue.getNumericId is an unused method DistributedQueue.containsValue creates a new String from byte[] with the default charset. Similar issues with String.getBytes. Run ant check-forbidden-apis from inside the solr folder for a full list. The completed queue and the failure queue are actually for OverseerCollectionProcessor so they should be named appropriately i.e. /overseer/collection-queue-completed and /overseer/collection-queue-failed Why are items being put in the completedQueue in Overseer? The async and status support is only for collection commands not overseer commands. The OverseerCollectionProcessor.processMessage catches Throwables with this patch. It should be changed back to catching Exception only. Processing OverseerCollectionProcessor tasks in multiple threads is a key requirement because these tasks can be long running. Just making them asynchronous is not enough. This hasn't been addressed in this patch yet.
          Hide
          Shalin Shekhar Mangar added a comment -

          This is Anshum's patch brought in sync with latest trunk.

          Show
          Shalin Shekhar Mangar added a comment - This is Anshum's patch brought in sync with latest trunk.
          Hide
          Anshum Gupta added a comment -

          Still a lot of stuff to be added/fixed but here's another patch so that if someone is interested, he/she could get some idea on the direction in which I'm moving.

          Here's what's been fixed/changed in this patch:

          • Standardized the request parameters for CoreAdmin and Collection level request status. Parameters now are : action=REQUESTSTATUS&requestid=XX
          • Added async option for all other calls (not just splitshard and createcollection).
          • Fixed the encoding related stuff and the potential NPEs where ever I could spot those.
          • Removed unused code and cleaned up some debugging stuff.
          • Used a ThreadPool in case of coreadmin in a rather raw manner i.e. static (and without the mode check etc). Have added todo's to add those checks and to also call shutdown for the threadpool.
          • Changed names of the queues to the ones suggested by Shalin.
          • Items are no longer being put in the completed queue in Overseer.
          • Ran ant check-forbidden-apis to fix reported issues.

          Here's what still lies with me incomplete/not-working:

          • Make shard requests async (pass the param to coreadmin requests and poll for completion/failure) from the OverseerCollectionProcessor.
          • Have a threadpool for OverseerCollectionProcessor so that the long running tasks don't block everything else.
          • Improve current tests and add more tests.
          Show
          Anshum Gupta added a comment - Still a lot of stuff to be added/fixed but here's another patch so that if someone is interested, he/she could get some idea on the direction in which I'm moving. Here's what's been fixed/changed in this patch: Standardized the request parameters for CoreAdmin and Collection level request status. Parameters now are : action=REQUESTSTATUS&requestid=XX Added async option for all other calls (not just splitshard and createcollection). Fixed the encoding related stuff and the potential NPEs where ever I could spot those. Removed unused code and cleaned up some debugging stuff. Used a ThreadPool in case of coreadmin in a rather raw manner i.e. static (and without the mode check etc). Have added todo's to add those checks and to also call shutdown for the threadpool. Changed names of the queues to the ones suggested by Shalin. Items are no longer being put in the completed queue in Overseer. Ran ant check-forbidden-apis to fix reported issues. Here's what still lies with me incomplete/not-working: Make shard requests async (pass the param to coreadmin requests and poll for completion/failure) from the OverseerCollectionProcessor. Have a threadpool for OverseerCollectionProcessor so that the long running tasks don't block everything else. Improve current tests and add more tests.
          Hide
          Anshum Gupta added a comment - - edited

          Another patch that has collection creation and split shard as async from OCP.
          Working on making other calls also async (which I think should be trivial and merely about calling the same methods).

          Working on storing/responding with better status for the status request.
          Right now, the status tracking is limited to : running/completed/failed/notfound.

          Also, trying to get another patch that saves information in zk instead of in-memory for CoreAdmin request state.

          Show
          Anshum Gupta added a comment - - edited Another patch that has collection creation and split shard as async from OCP. Working on making other calls also async (which I think should be trivial and merely about calling the same methods). Working on storing/responding with better status for the status request. Right now, the status tracking is limited to : running/completed/failed/notfound. Also, trying to get another patch that saves information in zk instead of in-memory for CoreAdmin request state.
          Hide
          Anshum Gupta added a comment -

          I had a discussion with Noble (offline) and we thought that holding this stuff for persistence in zk didn't make much sense. Here are the reasons:

          • Every request would need to de-duplicate against the submitted/completed/failed tasks and zk queues aren't fit for that. Every dedup would translate to fetching all children and running a compare or something. With all cores trying to use the same zk setup, not sure if it's even required. In memory hashes would work better at handling this use case.
          • We may not be interested in persistence of results over a longer duration i.e. if the node goes down, we should be fine with losing the request information as in general, the time taken to bring the same node back up would be more than someone else doing the job in the meanwhile (fair assumption?).
          Show
          Anshum Gupta added a comment - I had a discussion with Noble (offline) and we thought that holding this stuff for persistence in zk didn't make much sense. Here are the reasons: Every request would need to de-duplicate against the submitted/completed/failed tasks and zk queues aren't fit for that. Every dedup would translate to fetching all children and running a compare or something. With all cores trying to use the same zk setup, not sure if it's even required. In memory hashes would work better at handling this use case. We may not be interested in persistence of results over a longer duration i.e. if the node goes down, we should be fine with losing the request information as in general, the time taken to bring the same node back up would be more than someone else doing the job in the meanwhile (fair assumption?).
          Hide
          Shalin Shekhar Mangar added a comment -

          I don't understand why we are using DistributedQueues for holding status in the first place. It is just not required. We should just have a zk node constructed with a prefix and user specified hash for holding status.

          Show
          Shalin Shekhar Mangar added a comment - I don't understand why we are using DistributedQueues for holding status in the first place. It is just not required. We should just have a zk node constructed with a prefix and user specified hash for holding status.
          Hide
          Shalin Shekhar Mangar added a comment -

          Just to clarify, I'm not against storing status for core admin tasks in memory but just wanted to point out that problem of deduping by getting all children is a problem that we're creating ourselves by (ab)using DQ.

          Show
          Shalin Shekhar Mangar added a comment - Just to clarify, I'm not against storing status for core admin tasks in memory but just wanted to point out that problem of deduping by getting all children is a problem that we're creating ourselves by (ab)using DQ.
          Hide
          Anshum Gupta added a comment -

          Changed to using another implementation of storing data in zk. Not using DistributedQueue for anything but checking on the workQueue.

          The new implementation is a plain requestid based node map in zk and never really needs to do a getChildren or anything of that sort.

          Thanks for pointing that out Shalin.

          Show
          Anshum Gupta added a comment - Changed to using another implementation of storing data in zk. Not using DistributedQueue for anything but checking on the workQueue. The new implementation is a plain requestid based node map in zk and never really needs to do a getChildren or anything of that sort. Thanks for pointing that out Shalin.
          Hide
          Anshum Gupta added a comment -

          Seems like something got screwed up during 'svn diff'. Reposting the latest patch.

          Show
          Anshum Gupta added a comment - Seems like something got screwed up during 'svn diff'. Reposting the latest patch.
          Hide
          Anshum Gupta added a comment - - edited

          Added checking for existing tasks with the same id.

          Have a lot of coming up in the logs. Trying to debug that.

          414075 [Overseer-91131269805768705-10.0.0.3:7574_solr-n_0000000001] WARN org.apache.solr.cloud.OverseerCollectionProcessor – Overseer cannot talk to ZK
          414075 [Thread-16] ERROR org.apache.solr.cloud.Overseer – Exception in Overseer main queue loop
          org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue...
          at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
          at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
          at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
          at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:256)
          at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:253)
          at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)

          Show
          Anshum Gupta added a comment - - edited Added checking for existing tasks with the same id. Have a lot of coming up in the logs. Trying to debug that. 414075 [Overseer-91131269805768705-10.0.0.3:7574_solr-n_0000000001] WARN org.apache.solr.cloud.OverseerCollectionProcessor – Overseer cannot talk to ZK 414075 [Thread-16] ERROR org.apache.solr.cloud.Overseer – Exception in Overseer main queue loop org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue... at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:256) at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:253) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
          Hide
          Anshum Gupta added a comment -

          Cleaned up patch.

          Working on:

          • Limiting the length of tracking data structures.
          • Returning more information for request status.
          • Tests.
          Show
          Anshum Gupta added a comment - Cleaned up patch. Working on: Limiting the length of tracking data structures. Returning more information for request status. Tests.
          Hide
          Anshum Gupta added a comment -
          • Limited the size of tracking data structures.
          • Shutdown the ThreadPoolExectutor.
          • Fixed tests
          • Other changes - Naming, indentation, thread safety etc.

          It's almost there.

          TODO:

          • More tests
          • Handle CoreAdmin failures at OCP end a little better.
          Show
          Anshum Gupta added a comment - Limited the size of tracking data structures. Shutdown the ThreadPoolExectutor. Fixed tests Other changes - Naming, indentation, thread safety etc. It's almost there. TODO: More tests Handle CoreAdmin failures at OCP end a little better.
          Hide
          Anshum Gupta added a comment -

          Fixed an issue with task tracking.
          The patch has failing tests as the CoreAdminHandler.shutdown() is called at the wrong place/time.
          Working on correcting it.

          Show
          Anshum Gupta added a comment - Fixed an issue with task tracking. The patch has failing tests as the CoreAdminHandler.shutdown() is called at the wrong place/time. Working on correcting it.
          Hide
          Anshum Gupta added a comment -

          Patch without the ThreadPool.
          Fixed other issues - Exceptions in OCP and CoreAdminHandler.
          Response for CoreAdmin status request now contains:

          • responseHeader if successful.
          • Exception message if failure.
          Show
          Anshum Gupta added a comment - Patch without the ThreadPool. Fixed other issues - Exceptions in OCP and CoreAdminHandler. Response for CoreAdmin status request now contains: responseHeader if successful. Exception message if failure.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Anshum.

          1. Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here.
          2. Why is the taskQueue defined as a Map<String, Map<String, TaskObject>>? It can simply be a Map<String, TaskObject>. The task object itself can contain a volatile status flag to indicate running/completed/failure.
          3. The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached. It should remove the oldest entry instead.
          4. OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map
          5. The ‘migrate’ api doesn’t use async core admin requests
          6. In all places where synchronous calls have been replaced with waitForAsyncCallsToComplete calls, we need to ensure that the correct response messages are returned on failures. Right now, the waitForAsyncCallToComplete method returns silently on detecting failure.
          7. Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called. When do you intend to call this api?
          8. I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think?
          Show
          Shalin Shekhar Mangar added a comment - Thanks Anshum. Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here. Why is the taskQueue defined as a Map<String, Map<String, TaskObject>>? It can simply be a Map<String, TaskObject>. The task object itself can contain a volatile status flag to indicate running/completed/failure. The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached. It should remove the oldest entry instead. OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map The ‘migrate’ api doesn’t use async core admin requests In all places where synchronous calls have been replaced with waitForAsyncCallsToComplete calls, we need to ensure that the correct response messages are returned on failures. Right now, the waitForAsyncCallToComplete method returns silently on detecting failure. Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called. When do you intend to call this api? I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think?
          Hide
          Anshum Gupta added a comment -

          Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here.

          Changed it. Had that change on my machine before you mentioned

          Why is the taskQueue defined as a Map<String, Map<String, TaskObject>>? It can simply be a Map<String, TaskObject>.

          I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think?

          It takes away the ability (or atleast makes it too complicated) to limit number of tasks in a particular state e.g. limiting storage of 50 completed tasks only.

          The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached.

          It removes the first element. Its a synchronized LinkedHashMap so the iterator preserves order and returns the first element.

          OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map

          Success was supposed to mean that the task was found in a status map. It might actually make sense to change it. Thanks for the suggestion.

          Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called.

          The intention is for the user to explicitly call the API. There's no concept of a map/queue in zk that maintains insertion state.
          you'd have to check it, order it and then delete the apt one every time the numChildren exceeds the limit. I thought it was best left to the user.

          Will upload a patch with the following:

          • Migrate API to also use the ASYNC CoreAdmin requests.
          • Store the failed tasks information from CoreAdmin async calls in case of Collection API requests.
          • Tests for
            • migratekey (and other calls) in ASYNC mode.
            • Failing Collection API calls.
          Show
          Anshum Gupta added a comment - Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here. Changed it. Had that change on my machine before you mentioned Why is the taskQueue defined as a Map<String, Map<String, TaskObject>>? It can simply be a Map<String, TaskObject>. I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think? It takes away the ability (or atleast makes it too complicated) to limit number of tasks in a particular state e.g. limiting storage of 50 completed tasks only. The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached. It removes the first element. Its a synchronized LinkedHashMap so the iterator preserves order and returns the first element. OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map Success was supposed to mean that the task was found in a status map. It might actually make sense to change it. Thanks for the suggestion. Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called. The intention is for the user to explicitly call the API. There's no concept of a map/queue in zk that maintains insertion state. you'd have to check it, order it and then delete the apt one every time the numChildren exceeds the limit. I thought it was best left to the user. Will upload a patch with the following: Migrate API to also use the ASYNC CoreAdmin requests. Store the failed tasks information from CoreAdmin async calls in case of Collection API requests. Tests for migratekey (and other calls) in ASYNC mode. Failing Collection API calls.
          Hide
          Anshum Gupta added a comment - - edited

          Fixed the following:

          • Changed the var name from Queue to Map.
          • Response structure from OCP async calls changed. Now it's:
            <status>
              <state>running|failed|completed|notfound</state>
              <msg>apt message</msg>
            </status>
            
          Show
          Anshum Gupta added a comment - - edited Fixed the following: Changed the var name from Queue to Map. Response structure from OCP async calls changed. Now it's: <status> <state> running|failed|completed|notfound </state> <msg> apt message </msg> </status>
          Hide
          Anshum Gupta added a comment -

          Have added the async functionality to migrate key but it seems to be running into issues.

          Added test for async migrate key and fixed a few more things that I found while working on it.

          Show
          Anshum Gupta added a comment - Have added the async functionality to migrate key but it seems to be running into issues. Added test for async migrate key and fixed a few more things that I found while working on it.
          Hide
          Anshum Gupta added a comment -
          • Migrate key partially works. There are a couple of spots where the CoreAdmin async calls still fails as it doesn't find the async request at the Core. Have commented that single call out.
          • Added back Threadpoolexecutor as an instance var into CAH. Switched to using that and got tests to pass.
          Show
          Anshum Gupta added a comment - Migrate key partially works. There are a couple of spots where the CoreAdmin async calls still fails as it doesn't find the async request at the Core. Have commented that single call out. Added back Threadpoolexecutor as an instance var into CAH. Switched to using that and got tests to pass.
          Hide
          Anshum Gupta added a comment - - edited

          Added more tests:

          • test for duplicate requestid in CollectionAPI call.
          • test for failed request (duplicate collection creation attempt) using async.
          Show
          Anshum Gupta added a comment - - edited Added more tests: test for duplicate requestid in CollectionAPI call. test for failed request (duplicate collection creation attempt) using async.
          Hide
          Shalin Shekhar Mangar added a comment -

          Migrate key partially works. There are a couple of spots where the CoreAdmin async calls still fails as it doesn't find the async request at the Core. Have commented that single call out.

          I found the problem. Creation of the temp collection's leader on the target node in async mode fails because it is missing a collectShardResponse statement. It works fine in sync mode because we call collectShardResponses after the WaitForState call.

          Show
          Shalin Shekhar Mangar added a comment - Migrate key partially works. There are a couple of spots where the CoreAdmin async calls still fails as it doesn't find the async request at the Core. Have commented that single call out. I found the problem. Creation of the temp collection's leader on the target node in async mode fails because it is missing a collectShardResponse statement. It works fine in sync mode because we call collectShardResponses after the WaitForState call.
          Hide
          Shalin Shekhar Mangar added a comment -

          Also, the WaitForState call in migrate has not been made asynchronous yet. This is a long running call and a perfect candidate for the async feature.

          Show
          Shalin Shekhar Mangar added a comment - Also, the WaitForState call in migrate has not been made asynchronous yet. This is a long running call and a perfect candidate for the async feature.
          Hide
          Anshum Gupta added a comment -

          Refactored OCP to use setupAsyncRequest and completeAsyncRequest everywhere possible.

          Show
          Anshum Gupta added a comment - Refactored OCP to use setupAsyncRequest and completeAsyncRequest everywhere possible.
          Hide
          Anshum Gupta added a comment -

          Fixed migrateKey async mode.

          Thanks to Shalin for having a look at it.
          Will try and get WaitForState call to also be async next.

          The latest code has :

          • Async mode at OCP and CoreAdmin level (using ThreadPool).
          • Request status APIs for both collection and CoreAdmin levels.
          • Passing tests
          Show
          Anshum Gupta added a comment - Fixed migrateKey async mode. Thanks to Shalin for having a look at it. Will try and get WaitForState call to also be async next. The latest code has : Async mode at OCP and CoreAdmin level (using ThreadPool). Request status APIs for both collection and CoreAdmin levels. Passing tests
          Hide
          Anshum Gupta added a comment -

          Got WaitForState to also be Async during migrateKey call.

          Show
          Anshum Gupta added a comment - Got WaitForState to also be Async during migrateKey call.
          Hide
          Anshum Gupta added a comment -

          Considering that the OCP itself might have long running tasks and it's currently single threaded, we don't want it to be blocked.

          I've introduced multi threading in the OCP using a local TPE.

          Show
          Anshum Gupta added a comment - Considering that the OCP itself might have long running tasks and it's currently single threaded, we don't want it to be blocked. I've introduced multi threading in the OCP using a local TPE.
          Hide
          Anshum Gupta added a comment -

          I would like to have a separate issue for multi threading the OCP as it's pretty much a big task in itself.

          Show
          Anshum Gupta added a comment - I would like to have a separate issue for multi threading the OCP as it's pretty much a big task in itself.
          Hide
          Anshum Gupta added a comment -

          Making OverseerCollectionProcessor multi-threaded.

          Show
          Anshum Gupta added a comment - Making OverseerCollectionProcessor multi-threaded.
          Hide
          Shalin Shekhar Mangar added a comment -

          I'd like to commit this feature to trunk next week if there are no objections. The multi-threading aspect will be handled by SOLR-5681.

          Show
          Shalin Shekhar Mangar added a comment - I'd like to commit this feature to trunk next week if there are no objections. The multi-threading aspect will be handled by SOLR-5681 .
          Hide
          Anshum Gupta added a comment -

          Updated patch. There were quite a few conflicts and I think I fixed them.
          It'd be good if whoever commits this has a final look at it.

          Show
          Anshum Gupta added a comment - Updated patch. There were quite a few conflicts and I think I fixed them. It'd be good if whoever commits this has a final look at it.
          Hide
          Shalin Shekhar Mangar added a comment -

          I updated Anshum's latest patch.

          1. Fixed javadoc errors reported by ant precommit
          2. Also removed a TODO in CoreAdminHandler:811 because we now return full responses instead of just a status
          Show
          Shalin Shekhar Mangar added a comment - I updated Anshum's latest patch. Fixed javadoc errors reported by ant precommit Also removed a TODO in CoreAdminHandler:811 because we now return full responses instead of just a status
          Hide
          Anshum Gupta added a comment -

          Patch updated to trunk (again). I'll re-run some tests and if all goes well, will commit it tomorrow.

          Show
          Anshum Gupta added a comment - Patch updated to trunk (again). I'll re-run some tests and if all goes well, will commit it tomorrow.
          Hide
          Mark Miller added a comment -

          Hey Anshum, this looks pretty good!

          I did a quick code review and dumped my comments below:

          • Doesn't seem like this should be logged info level:

            log.info("REQUESTSTATUS action invoked: " + req.getParamString());

          • We should pull requestid into a constant - there already is one in CoreAdminParams:

            r.add("requestid", (String) m.get(ASYNC));

          • We should clean this up:

            // parallelHandlerThread.run();

          • We should clean this up?
                } catch (Exception e) {
                  throw e;
                }
            
          • Shouldn't we at least keep doing this in non cloud mode?

            rsp.setHttpCaching(false);

          • We should use constants here:
              /**
               * Handle "REQUESTSTATUS" action
               */
              protected void handleRequestActionStatus(SolrQueryRequest req, SolrQueryResponse rsp) {
                SolrParams params = req.getParams();
                String requestId = params.get(CoreAdminParams.REQUESTID);
                log.info("Checking request status for : " + requestId);
            
                if (mapContainsTask("running", requestId)) {
                  rsp.add("STATUS", "running");
                } else if(mapContainsTask("completed", requestId)) {
                  rsp.add("STATUS", "completed");
                  rsp.add("Response", getMap("completed").get(requestId).getRspObject());
                } else if(mapContainsTask("failed", requestId)) {
                  rsp.add("STATUS", "failed");
                  rsp.add("Response", getMap("failed").get(requestId).getRspObject());
                } else {
                  rsp.add("STATUS", "notfound");
                  rsp.add("msg", "No task found in running, completed or failed tasks");
                }
              }
            
          • The patch hits CoreAdminRequest but makes no real change.
          • Should consider using ExecUtil.shutdown:
              public void shutdown() {
                if (parallelExecutor != null && !parallelExecutor.isShutdown())
                  parallelExecutor.shutdown();
              }
            
          • The following should continue CoreContainer shutdown even if it throws an exception:

            coreAdminHandler.shutdown();

          • Does this work nicely with the collections api solrj classes?
          • In TestRequestStatusCollectionAPI, doesn't following mean the test can pass in bad cases and miss asserts?
                } catch (SolrServerException e) {
                  e.printStackTrace();
                } catch (IOException e) {
                  e.printStackTrace();
                }
            
          • Is there any stress testing? I wonder about some of the promises in terms of request id's and the status api for example - can two clients not race creating the same id? Are there any tests that try and fire off a bunch of these async commands in parralel?
          Show
          Mark Miller added a comment - Hey Anshum, this looks pretty good! I did a quick code review and dumped my comments below: Doesn't seem like this should be logged info level: log.info("REQUESTSTATUS action invoked: " + req.getParamString()); We should pull requestid into a constant - there already is one in CoreAdminParams: r.add("requestid", (String) m.get(ASYNC)); We should clean this up: // parallelHandlerThread.run(); We should clean this up? } catch (Exception e) { throw e; } Shouldn't we at least keep doing this in non cloud mode? rsp.setHttpCaching(false); We should use constants here: /** * Handle "REQUESTSTATUS" action */ protected void handleRequestActionStatus(SolrQueryRequest req, SolrQueryResponse rsp) { SolrParams params = req.getParams(); String requestId = params.get(CoreAdminParams.REQUESTID); log.info("Checking request status for : " + requestId); if (mapContainsTask("running", requestId)) { rsp.add("STATUS", "running"); } else if(mapContainsTask("completed", requestId)) { rsp.add("STATUS", "completed"); rsp.add("Response", getMap("completed").get(requestId).getRspObject()); } else if(mapContainsTask("failed", requestId)) { rsp.add("STATUS", "failed"); rsp.add("Response", getMap("failed").get(requestId).getRspObject()); } else { rsp.add("STATUS", "notfound"); rsp.add("msg", "No task found in running, completed or failed tasks"); } } The patch hits CoreAdminRequest but makes no real change. Should consider using ExecUtil.shutdown: public void shutdown() { if (parallelExecutor != null && !parallelExecutor.isShutdown()) parallelExecutor.shutdown(); } The following should continue CoreContainer shutdown even if it throws an exception: coreAdminHandler.shutdown(); Does this work nicely with the collections api solrj classes? In TestRequestStatusCollectionAPI, doesn't following mean the test can pass in bad cases and miss asserts? } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } Is there any stress testing? I wonder about some of the promises in terms of request id's and the status api for example - can two clients not race creating the same id? Are there any tests that try and fire off a bunch of these async commands in parralel?
          Hide
          Mark Miller added a comment -

          On the following:

          * Shouldn't we at least keep doing this in non cloud mode? 
          bq. rsp.setHttpCaching(false); 
          

          Was that a mistake - even in cloud mode, shouldn't we continue to do this?

          Show
          Mark Miller added a comment - On the following: * Shouldn't we at least keep doing this in non cloud mode? bq. rsp.setHttpCaching(false); Was that a mistake - even in cloud mode, shouldn't we continue to do this?
          Hide
          Anshum Gupta added a comment -

          Thanks for looking at it Mark.
          Here's a patch that addresses most of the stuff.

          rsp.setHttpCaching(false);

          I'd moved it but it makes sense to move it back (done).

          About 2 clients racing to create the same id, I don't think that would happen (if you're talking about the OCP creating conflicting id) as it's currently single threaded and at a given time only one task would get processed.
          I'll however look at it again to see if I'm missing something.

          Show
          Anshum Gupta added a comment - Thanks for looking at it Mark. Here's a patch that addresses most of the stuff. rsp.setHttpCaching(false); I'd moved it but it makes sense to move it back (done). About 2 clients racing to create the same id, I don't think that would happen (if you're talking about the OCP creating conflicting id) as it's currently single threaded and at a given time only one task would get processed. I'll however look at it again to see if I'm missing something.
          Hide
          Anshum Gupta added a comment -

          Mark Miller I haven't added anything for SolrJ so for now, it doesn't really support async calls. I am assuming that by collection API SolrJ calls you mean methods like "CollectionAdminRequest.createCollection()".

          Also, I'm working on adding some stress tests i.e. something that fires multiple async requests.

          Show
          Anshum Gupta added a comment - Mark Miller I haven't added anything for SolrJ so for now, it doesn't really support async calls. I am assuming that by collection API SolrJ calls you mean methods like "CollectionAdminRequest.createCollection()". Also, I'm working on adding some stress tests i.e. something that fires multiple async requests.
          Hide
          Shalin Shekhar Mangar added a comment -

          I haven't added anything for SolrJ so for now, it doesn't really support async calls.

          Let's open a new issue for this. It'd be nice to add support in SolrJ sooner rather than later.

          Show
          Shalin Shekhar Mangar added a comment - I haven't added anything for SolrJ so for now, it doesn't really support async calls. Let's open a new issue for this. It'd be nice to add support in SolrJ sooner rather than later.
          Hide
          Mark Miller added a comment -

          SolrJ calls you mean methods like "CollectionAdminRequest.createCollection()".

          Right - it can def come in a second issue, but it seems like just at least adding the async param is pretty low hanging fruit.

          Show
          Mark Miller added a comment - SolrJ calls you mean methods like "CollectionAdminRequest.createCollection()". Right - it can def come in a second issue, but it seems like just at least adding the async param is pretty low hanging fruit.
          Hide
          Anshum Gupta added a comment -

          Mark Miller Sure, I'll add that and put up another patch. It's just that I wanted to get it into trunk sooner than later considering that the patch touches a reasonable points in the code, which makes it tricky to forward port every time.

          Show
          Anshum Gupta added a comment - Mark Miller Sure, I'll add that and put up another patch. It's just that I wanted to get it into trunk sooner than later considering that the patch touches a reasonable points in the code, which makes it tricky to forward port every time.
          Hide
          Anshum Gupta added a comment -

          Patch with SolrJ support and tests.

          Show
          Anshum Gupta added a comment - Patch with SolrJ support and tests.
          Hide
          Mark Miller added a comment -

          +1

          Show
          Mark Miller added a comment - +1
          Hide
          ASF subversion and git services added a comment -

          Commit 1577444 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1577444 ]

          SOLR-5477: Async execution of OverseerCollectionProcessor tasks

          Show
          ASF subversion and git services added a comment - Commit 1577444 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1577444 ] SOLR-5477 : Async execution of OverseerCollectionProcessor tasks
          Hide
          Steve Davids added a comment -

          This code manipulates the URL scheme in the OverSeerCollectionProcessor, this is not necessary and may cause issues for clients that want to run in ssl mode. You may want to consider dropping it:

          Index: solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java
          ===================================================================
          --- solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java	(revision 1577773)
          +++ solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java	(working copy)
          @@ -1826,8 +1826,6 @@
                     params.set(CoreAdminParams.COLLECTION, collectionName);
                     params.set(CoreAdminParams.SHARD, sliceName);
                     params.set(ZkStateReader.NUM_SHARDS_PROP, numSlices);
          -          String replica = zkStateReader.getBaseUrlForNodeName(nodeName);
          -          if (replica.startsWith("http://")) replica = replica.substring(7);
           
                     setupAsyncRequest(async, requestMap, params, nodeName);
           
          @@ -2139,7 +2137,6 @@
                 params.set("qt", adminPath);
                 sreq.purpose = 1;
                 String replica = zkStateReader.getBaseUrlForNodeName(nodeName);
          -      if (replica.startsWith("http://")) replica = replica.substring(7);
                 sreq.shards = new String[] {replica};
                 sreq.actualShards = sreq.shards;
                 sreq.params = params;
          
          Show
          Steve Davids added a comment - This code manipulates the URL scheme in the OverSeerCollectionProcessor, this is not necessary and may cause issues for clients that want to run in ssl mode. You may want to consider dropping it: Index: solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java =================================================================== --- solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java (revision 1577773) +++ solr/core/src/java/org/apache/solr/cloud/OverseerCollectionProcessor.java (working copy) @@ -1826,8 +1826,6 @@ params.set(CoreAdminParams.COLLECTION, collectionName); params.set(CoreAdminParams.SHARD, sliceName); params.set(ZkStateReader.NUM_SHARDS_PROP, numSlices); - String replica = zkStateReader.getBaseUrlForNodeName(nodeName); - if (replica.startsWith("http://")) replica = replica.substring(7); setupAsyncRequest(async, requestMap, params, nodeName); @@ -2139,7 +2137,6 @@ params.set("qt", adminPath); sreq.purpose = 1; String replica = zkStateReader.getBaseUrlForNodeName(nodeName); - if (replica.startsWith("http://")) replica = replica.substring(7); sreq.shards = new String[] {replica}; sreq.actualShards = sreq.shards; sreq.params = params;
          Hide
          Anshum Gupta added a comment -

          Thanks for pointing that out Steve.
          This must have gotten in when I started working on this one i.e. before SOLR-3854 went in and just stayed as a result of a bad merge.

          I'll fix this up.

          Show
          Anshum Gupta added a comment - Thanks for pointing that out Steve. This must have gotten in when I started working on this one i.e. before SOLR-3854 went in and just stayed as a result of a bad merge. I'll fix this up.
          Hide
          Anshum Gupta added a comment -

          Fix for not modifying url scheme.

          Show
          Anshum Gupta added a comment - Fix for not modifying url scheme.
          Hide
          ASF subversion and git services added a comment -

          Commit 1577801 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1577801 ]

          SOLR-5477: Fix URL scheme modification from an earlier commit for SOLR-5477.

          Show
          ASF subversion and git services added a comment - Commit 1577801 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1577801 ] SOLR-5477 : Fix URL scheme modification from an earlier commit for SOLR-5477 .
          Hide
          Steve Davids added a comment -

          You should drop the unnecessary assignment:

          String replica = zkStateReader.getBaseUrlForNodeName(nodeName);
          

          on line 1829, making an unnecessary call out to zk for a value that isn't being used.

          Show
          Steve Davids added a comment - You should drop the unnecessary assignment: String replica = zkStateReader.getBaseUrlForNodeName(nodeName); on line 1829, making an unnecessary call out to zk for a value that isn't being used.
          Hide
          ASF subversion and git services added a comment -

          Commit 1577965 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1577965 ]

          SOLR-5477: Removing an unwanted call to zk

          Show
          ASF subversion and git services added a comment - Commit 1577965 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1577965 ] SOLR-5477 : Removing an unwanted call to zk
          Hide
          ASF subversion and git services added a comment -

          Commit 1578598 from Anshum Gupta in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1578598 ]

          SOLR-5477: Async execution of OverseerCollectionProcessor tasks (merged trunk r1577444, r1577801, r1577965)

          Show
          ASF subversion and git services added a comment - Commit 1578598 from Anshum Gupta in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578598 ] SOLR-5477 : Async execution of OverseerCollectionProcessor tasks (merged trunk r1577444, r1577801, r1577965)
          Hide
          Noble Paul added a comment - - edited

          Line 222 OverseerCollectionProcessor

          final String asyncId = (message.containsKey(ASYNC) && message.get(ASYNC) != null) ? (String) message.get(ASYNC) : null;

          we should use message.getStr() instead of typecasting to String

          Line 237
          This code stores data in java serialization format. Can we do json serialization?

          head.setBytes(SolrResponse.serializable(response));

          Show
          Noble Paul added a comment - - edited Line 222 OverseerCollectionProcessor final String asyncId = (message.containsKey(ASYNC) && message.get(ASYNC) != null) ? (String) message.get(ASYNC) : null; we should use message.getStr() instead of typecasting to String Line 237 This code stores data in java serialization format. Can we do json serialization? head.setBytes(SolrResponse.serializable(response));
          Hide
          Anshum Gupta added a comment -

          Noble Paul This one doesn't really belong to this JIRA i.e. it wasn't added/changed in this patch but yes, we could look at changing this as a part of another issue.

          Show
          Anshum Gupta added a comment - Noble Paul This one doesn't really belong to this JIRA i.e. it wasn't added/changed in this patch but yes, we could look at changing this as a part of another issue.
          Hide
          Noble Paul added a comment -

          OK, the diff somehow showed it as a part of the check in. If you are storing the data in json it should be good

          Show
          Noble Paul added a comment - OK, the diff somehow showed it as a part of the check in. If you are storing the data in json it should be good
          Hide
          Anshum Gupta added a comment -

          Noble Paul for now it's more of a string. SOLR-5886 would require more information to be stored. I'll use your suggestion on that one.

          Show
          Anshum Gupta added a comment - Noble Paul for now it's more of a string. SOLR-5886 would require more information to be stored. I'll use your suggestion on that one.
          Hide
          Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          Show
          Uwe Schindler added a comment - Close issue after release of 4.8.0

            People

            • Assignee:
              Anshum Gupta
              Reporter:
              Noble Paul
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development