Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9241 Rebalance API for SolrCloud
  3. SOLR-9320

A REPLACENODE command to decommission an existing node with another new node

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2, 7.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      The command should accept a source node and target node. recreate the replicas in source node in the target and do a DLETENODE of source node

      1. DELETENODE.jpeg
        108 kB
        Nitin Sharma
      2. REPLACENODE_After.jpeg
        29 kB
        Nitin Sharma
      3. REPLACENODE_Before.jpeg
        25 kB
        Nitin Sharma
      4. REPLACENODE_call_response.jpeg
        129 kB
        Nitin Sharma
      5. SOLR-9320_followup.patch
        11 kB
        Varun Thacker
      6. SOLR-9320_followup.patch
        14 kB
        Varun Thacker
      7. SOLR-9320.patch
        44 kB
        Noble Paul
      8. SOLR-9320.patch
        37 kB
        Noble Paul
      9. SOLR-9320.patch
        37 kB
        Noble Paul
      10. SOLR-9320.patch
        37 kB
        Noble Paul
      11. SOLR-9320.patch
        26 kB
        Noble Paul
      12. SOLR-9320.patch
        7 kB
        Noble Paul
      13. SOLR-9320.patch
        36 kB
        Nitin Sharma

        Activity

        Hide
        nitin.sharma Nitin Sharma added a comment -

        Clarification on this. When you mean recreate, does the naming matter? Lets say x_shard1_replica1 is on source node, if you want to move it to destination node, we can
        a) either create a new replica (x_shard1_replica2) and delete the source. That will leave uneven naming conventions in the cluster. (there will not be a replica1 but a replica2).
        b) Preserve the exact same name as the replica in source node. We can achieve this by creating a temp replica on destination first, deleting the replica on source, recreating the replica (with same name) on destination and then cleaning up the temp.

        Option (b) can be thought of as a migrate core.

        Let me know which sounds more usable.

        Show
        nitin.sharma Nitin Sharma added a comment - Clarification on this. When you mean recreate, does the naming matter? Lets say x_shard1_replica1 is on source node, if you want to move it to destination node, we can a) either create a new replica (x_shard1_replica2) and delete the source. That will leave uneven naming conventions in the cluster. (there will not be a replica1 but a replica2). b) Preserve the exact same name as the replica in source node. We can achieve this by creating a temp replica on destination first, deleting the replica on source, recreating the replica (with same name) on destination and then cleaning up the temp. Option (b) can be thought of as a migrate core. Let me know which sounds more usable.
        Hide
        noble.paul Noble Paul added a comment -

        That will leave uneven naming conventions in the cluster.

        replica names do not matter. You can create a new replica in any name as you want

        Show
        noble.paul Noble Paul added a comment - That will leave uneven naming conventions in the cluster. replica names do not matter. You can create a new replica in any name as you want
        Hide
        erickerickson Erick Erickson added a comment -

        (b) would copy the index twice, correct? I've seen (and I wouldn't lie) TB size indexes on a single replica so I think that would be prohibitively expensive.

        Show
        erickerickson Erick Erickson added a comment - (b) would copy the index twice, correct? I've seen (and I wouldn't lie) TB size indexes on a single replica so I think that would be prohibitively expensive.
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Thanks for clarifying. I am aligning towards option a) as well (for simplicity and performance) but wanted to confirm the semantics.

        Show
        nitin.sharma Nitin Sharma added a comment - Thanks for clarifying. I am aligning towards option a) as well (for simplicity and performance) but wanted to confirm the semantics.
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Patch for REPLACENODE & DELETENODE api as per spec.

        DELETENODE- Deletes all cores on the given node.

        REPLACENODE - Replaces all cores from a source node to a dest node and then calls DELETENODE on the source node.

        Attached screenshots against test cluster with api calls & before/after status of the cluster

        Show
        nitin.sharma Nitin Sharma added a comment - Patch for REPLACENODE & DELETENODE api as per spec. DELETENODE- Deletes all cores on the given node. REPLACENODE - Replaces all cores from a source node to a dest node and then calls DELETENODE on the source node. Attached screenshots against test cluster with api calls & before/after status of the cluster
        Hide
        noble.paul Noble Paul added a comment -

        REPLACENODE is just like any other command that we already have. This patch seems to be too complex for a very simple functionality like that. Please refer implementations of other commands.

        Show
        noble.paul Noble Paul added a comment - REPLACENODE is just like any other command that we already have. This patch seems to be too complex for a very simple functionality like that. Please refer implementations of other commands.
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Noble Paul Can you explain about the complex part? The code does what was mentioned in the spec. It identifies all the cores on the node to be replaced and recreates them on the destination node. After that it calls "DELETENODE" on the source node. Which part of this complex? The multithreaded part?

        Show
        nitin.sharma Nitin Sharma added a comment - Noble Paul Can you explain about the complex part? The code does what was mentioned in the spec. It identifies all the cores on the node to be replaced and recreates them on the destination node. After that it calls "DELETENODE" on the source node. Which part of this complex? The multithreaded part?
        Hide
        noble.paul Noble Paul added a comment -

        I really didn't mean that the code was complex to understand. I meant adding so many files for such a simple functionality is not worth it. We generally follow a pattern for multiple commands.

        Show
        noble.paul Noble Paul added a comment - I really didn't mean that the code was complex to understand. I meant adding so many files for such a simple functionality is not worth it. We generally follow a pattern for multiple commands.
        Hide
        nitin.sharma Nitin Sharma added a comment -

        This has the patch for both REPLACENODE and DELETENODE inside it. I can reduce the num files again and re-upload if you like.

        Show
        nitin.sharma Nitin Sharma added a comment - This has the patch for both REPLACENODE and DELETENODE inside it. I can reduce the num files again and re-upload if you like.
        Hide
        noble.paul Noble Paul added a comment -

        If I'm not wrong you just need to add one method. This is a trivial functionality

        Show
        noble.paul Noble Paul added a comment - If I'm not wrong you just need to add one method. This is a trivial functionality
        Hide
        anshumg Anshum Gupta added a comment -

        I am having issues applying this patch against master.
        I tried to take a look at this without applying the patch and the patch does add complexity, as Noble pointed out.

        Show
        anshumg Anshum Gupta added a comment - I am having issues applying this patch against master. I tried to take a look at this without applying the patch and the patch does add complexity, as Noble pointed out.
        Hide
        noble.paul Noble Paul added a comment -

        without tests

        Show
        noble.paul Noble Paul added a comment - without tests
        Hide
        nitin.sharma Nitin Sharma added a comment -

        I have a concern when this comes to scaling to 100s of collections. If you have 100+ collections, this command will eventually timeout (> 3 mins) and this does this in serial. The patch i sent (has multi threading capability), I ran with 500 collections and it finished around 4+ mins. Can you share the numbers with collections of that scale?

        Show
        nitin.sharma Nitin Sharma added a comment - I have a concern when this comes to scaling to 100s of collections. If you have 100+ collections, this command will eventually timeout (> 3 mins) and this does this in serial. The patch i sent (has multi threading capability), I ran with 500 collections and it finished around 4+ mins. Can you share the numbers with collections of that scale?
        Hide
        noble.paul Noble Paul added a comment -

        Multithreading is already of part of OverseerCollectionMessageHandler . You just have to invoke the command with the async flag and It will do everything in a thread of its own

        BTW Why would you run REPLACENODE with 500 collections? It runs one node at a time , right?

        Show
        noble.paul Noble Paul added a comment - Multithreading is already of part of OverseerCollectionMessageHandler . You just have to invoke the command with the async flag and It will do everything in a thread of its own BTW Why would you run REPLACENODE with 500 collections? It runs one node at a time , right?
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Multithreading is already of part of OverseerCollectionMessageHandler

        That is only for executing collection APIs concurrently. It does not help when you are trying to add replica to target and remove replica from source serially for each replicas on a particular node. Nitin is right that it should be multi-threaded otherwise it will take a long time for a node hosting a lot of replicas.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Multithreading is already of part of OverseerCollectionMessageHandler That is only for executing collection APIs concurrently. It does not help when you are trying to add replica to target and remove replica from source serially for each replicas on a particular node. Nitin is right that it should be multi-threaded otherwise it will take a long time for a node hosting a lot of replicas.
        Hide
        nitin.sharma Nitin Sharma added a comment - - edited

        Thanks for clarifying on async. Isnt that only for executing multiple api in parallel. If you want parallelism within the api, you still need your own exec service right?

        Reg 500 collections: When you run replace node on a node x and the node happens to have 500 cores, then all these cores need to be moved to the destination (and then removed from source). If you have too many cores, adding a new replica for every core on the destination host will take time. Wondering if you ran any tests with such high capacity

        Show
        nitin.sharma Nitin Sharma added a comment - - edited Thanks for clarifying on async. Isnt that only for executing multiple api in parallel. If you want parallelism within the api, you still need your own exec service right? Reg 500 collections: When you run replace node on a node x and the node happens to have 500 cores, then all these cores need to be moved to the destination (and then removed from source). If you have too many cores, adding a new replica for every core on the destination host will take time. Wondering if you ran any tests with such high capacity
        Hide
        noble.paul Noble Paul added a comment -

        I don't think so. The create replica call can be async as well.

        Show
        noble.paul Noble Paul added a comment - I don't think so. The create replica call can be async as well.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        I don't think so. The create replica call can be async as well.

        It can be but in your patch it is not. The addReplica method call will wait for the async core create call to complete and also for the new replica to be visible in cluster state. Similarly, the deleteReplica method waits for the async core unload to complete and wait for coreNode name to be deleted from the cluster state.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - I don't think so. The create replica call can be async as well. It can be but in your patch it is not. The addReplica method call will wait for the async core create call to complete and also for the new replica to be visible in cluster state. Similarly, the deleteReplica method waits for the async core unload to complete and wait for coreNode name to be deleted from the cluster state.
        Hide
        noble.paul Noble Paul added a comment -

        Yeah, It's not in the patch. What I meant to say was the Multithreading framework is there and the code just needs to take advantage of that.

        OTOH, I wonder how performant does it need to be ? Replacing a node may take a while and as long as it completes without timing out, is it no good enough? The health of the system is not degraded or no action is blocked while the REPLACENODE is happening. I think, instead of creating 500 cores on the same node (vying for the same disk/CPU) it is better to do it serially

        Show
        noble.paul Noble Paul added a comment - Yeah, It's not in the patch. What I meant to say was the Multithreading framework is there and the code just needs to take advantage of that. OTOH, I wonder how performant does it need to be ? Replacing a node may take a while and as long as it completes without timing out, is it no good enough? The health of the system is not degraded or no action is blocked while the REPLACENODE is happening. I think, instead of creating 500 cores on the same node (vying for the same disk/CPU) it is better to do it serially
        Hide
        erickerickson Erick Erickson added a comment -

        If we're going to take advantage of multithreading we need to take some care to throttle replication. If we try to copy 500 cores' indexes to some other node all at once with 500 separate threads I'd worry about network bandwidth issues. I've seen saturated I/O cause nodes to go into recovery and the like. Not to mention beating up the disks on both machines.

        The number of replace operations to carry out in parallel is probably the easiest. I can think of two tuning parameters, max # of simultaneous threads and max bandwidth consumption.

        Thinking about it for a bit, though, the bandwidth parameter seems like it's difficult to do well. There'd have to be some kind of cross-copy communications I should think. And other ugliness. I suppose one could get this behavior on a case-by-case basis by
        1> specifying the max # of replace ops
        2> specifying <str name="maxWriteMBPerSec">$

        {maxWriteMbPerSec:100000}

        </str> in the replication handler and overriding with a sys var when doing a REPLACENODE....

        Show
        erickerickson Erick Erickson added a comment - If we're going to take advantage of multithreading we need to take some care to throttle replication. If we try to copy 500 cores' indexes to some other node all at once with 500 separate threads I'd worry about network bandwidth issues. I've seen saturated I/O cause nodes to go into recovery and the like. Not to mention beating up the disks on both machines. The number of replace operations to carry out in parallel is probably the easiest. I can think of two tuning parameters, max # of simultaneous threads and max bandwidth consumption. Thinking about it for a bit, though, the bandwidth parameter seems like it's difficult to do well. There'd have to be some kind of cross-copy communications I should think. And other ugliness. I suppose one could get this behavior on a case-by-case basis by 1> specifying the max # of replace ops 2> specifying <str name="maxWriteMBPerSec">$ {maxWriteMbPerSec:100000} </str> in the replication handler and overriding with a sys var when doing a REPLACENODE....
        Hide
        noble.paul Noble Paul added a comment -

        Erick Erickson I'm inclined to create cores one after another. Even if takes some time, it is better to ensure that we don't overload that one node. This is an admin command which is issued asynchronously. So, there is no harm even if it takes a while.

        Show
        noble.paul Noble Paul added a comment - Erick Erickson I'm inclined to create cores one after another. Even if takes some time, it is better to ensure that we don't overload that one node. This is an admin command which is issued asynchronously. So, there is no harm even if it takes a while.
        Hide
        erickerickson Erick Erickson added a comment -

        Noble Paul That certainly works. My goal was just to be sure bandwidth figured in to the design if people wanted to multi-thread.

        And doing one after the other could still be throttled by maxWriteMBPerSec if a single node overwhelmed the network or disks (I've actually seen this in the wild FWIW).

        Show
        erickerickson Erick Erickson added a comment - Noble Paul That certainly works. My goal was just to be sure bandwidth figured in to the design if people wanted to multi-thread. And doing one after the other could still be throttled by maxWriteMBPerSec if a single node overwhelmed the network or disks (I've actually seen this in the wild FWIW).
        Hide
        noble.paul Noble Paul added a comment -

        And doing one after the other could still be throttled by maxWriteMBPerSec

        The problem is, we can't do that from the REPLACENODE command. The downloads happen in a different thread from the target node

        Show
        noble.paul Noble Paul added a comment - And doing one after the other could still be throttled by maxWriteMBPerSec The problem is, we can't do that from the REPLACENODE command. The downloads happen in a different thread from the target node
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Would be good to establish context on when this command should/can be run. In our current version of rebalance, we use this to switch serving nodes in production and hence the time taken to migrate a node is limited (time bound). If this is considered an offline process, then may be serial works just fine.

        I agree with Erick on throttling but we can do that at a system level (unix traffic shaping tc etc). That will leave the bandwidth management outside the responsibility of solr. The same node serving solr can be used to run other services in some production setups. Keeping that outside solr will put that onus on the maintainer. Just my $0.02

        Show
        nitin.sharma Nitin Sharma added a comment - Would be good to establish context on when this command should/can be run. In our current version of rebalance, we use this to switch serving nodes in production and hence the time taken to migrate a node is limited (time bound). If this is considered an offline process, then may be serial works just fine. I agree with Erick on throttling but we can do that at a system level (unix traffic shaping tc etc). That will leave the bandwidth management outside the responsibility of solr. The same node serving solr can be used to run other services in some production setups. Keeping that outside solr will put that onus on the maintainer. Just my $0.02
        Hide
        noble.paul Noble Paul added a comment -

        The best we can do is pass an extra parameter to speed up the execution (say parallel=true). But the real risk of screwing up the node/network exists

        Show
        noble.paul Noble Paul added a comment - The best we can do is pass an extra parameter to speed up the execution (say parallel=true). But the real risk of screwing up the node/network exists
        Hide
        noble.paul Noble Paul added a comment -

        can be executed in 2 modes

        with a parameter parallel=true. This ensures that all cores are created and deleted in parallel in multiple threads. If the param is not passed they will be created one after other.

        I have not done any performance benchmarking. But with the parallel=true flag finishes much faster

        Show
        noble.paul Noble Paul added a comment - can be executed in 2 modes with a parameter parallel=true . This ensures that all cores are created and deleted in parallel in multiple threads. If the param is not passed they will be created one after other. I have not done any performance benchmarking. But with the parallel=true flag finishes much faster
        Hide
        nitin.sharma Nitin Sharma added a comment -

        The patch i attached follows the original spec you proposed. Replace node after creating new replicas will call "DELETENODE" on the destination host which will remove all cores on the node. Can you merge this with that as well? Thanks.

        I will patch this on top of 6.1 and give it a shot

        Show
        nitin.sharma Nitin Sharma added a comment - The patch i attached follows the original spec you proposed. Replace node after creating new replicas will call "DELETENODE" on the destination host which will remove all cores on the node. Can you merge this with that as well? Thanks. I will patch this on top of 6.1 and give it a shot
        Hide
        noble.paul Noble Paul added a comment -

        Is the current patch missing anything?

        Show
        noble.paul Noble Paul added a comment - Is the current patch missing anything?
        Hide
        noble.paul Noble Paul added a comment -

        with tests

        Show
        noble.paul Noble Paul added a comment - with tests
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Functionality seems to be fine to me. The spec you mentioned in the description was that REPLACE does a create and then invokes a DELETENODE. The recent patch seems to do iterate and do a delete replica instead of having a DELETENODE action and calling that. Hence the original question.

        I tried the first one and it works fine. I will try your new patch.

        Show
        nitin.sharma Nitin Sharma added a comment - Functionality seems to be fine to me. The spec you mentioned in the description was that REPLACE does a create and then invokes a DELETENODE. The recent patch seems to do iterate and do a delete replica instead of having a DELETENODE action and calling that. Hence the original question. I tried the first one and it works fine. I will try your new patch.
        Hide
        noble.paul Noble Paul added a comment -

        no need to do deletes serially

        Show
        noble.paul Noble Paul added a comment - no need to do deletes serially
        Hide
        noble.paul Noble Paul added a comment -

        this contains SOLR-9318 as well

        Show
        noble.paul Noble Paul added a comment - this contains SOLR-9318 as well
        Hide
        nitin.sharma Nitin Sharma added a comment -

        Most of the patch looks good. In ReplaceNodeCmd, instead of calling DeleteNodeCmd.deletereplicas, can you just called DELETENODE cmd on the source node itself? Looks much cleaner that way and one command doesnt need to know the details of the other cmd.

        Show
        nitin.sharma Nitin Sharma added a comment - Most of the patch looks good. In ReplaceNodeCmd, instead of calling DeleteNodeCmd.deletereplicas, can you just called DELETENODE cmd on the source node itself? Looks much cleaner that way and one command doesnt need to know the details of the other cmd.
        Hide
        varunthacker Varun Thacker added a comment -

        Patch looks good! Here are a few comments on that patch

        Any reason why we needed to change modify AsyncCollectionAdminRequest#setAsyncId ?

        CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of Javadocs explaining what the parameters are?

        In CollectionParams, should REPLACENODE have lock level as none? What if we are replicaing a node and someone fires a split shard request? To be correct we need to lock right?

        In OverseerCollectionMessageHandler#addReplica can't we make node and coreName final to start with?

            final String fnode = node;
            final String fcoreName = coreName;
        

        Can the tests please extend SolrCloudTestCase ? On my machine it takes one min for RepliceNodeTest to run. I'd guess it will be a lot faster when its using SolrCloudTestCase

        The logging in this patch needs to be improved. It's too verbose and redundant.

        For example these log line is so verbose and adds little value -

        42134 INFO  (TEST-ReplaceNodeTest.test-seed#[F960E2C81499AC90]) [    ] o.a.s.c.ReplaceNodeTest excluded_node : 127.0.0.1:51079_x%2Ftc  coll_state : {
          "replicationFactor":"2",
          "shards":{
            "shard1":{
              "range":"80000000-b332ffff",
              "state":"active",
              "replicas":{
                "core_node1":{
                  "core":"replacenodetest_coll_shard1_replica2",
                  "base_url":"http://127.0.0.1:51075/x/tc",
                  "node_name":"127.0.0.1:51075_x%2Ftc",
                  "state":"active",
                  "leader":"true"},
                "core_node5":{
                  "core":"replacenodetest_coll_shard1_replica1",
                  "base_url":"http://127.0.0.1:51089/x/tc",
                  "node_name":"127.0.0.1:51089_x%2Ftc",
                  "state":"active"}}},
            "shard2":{
              "range":"b3330000-e665ffff",
              "state":"active",
              "replicas":{
                "core_node2":{
                  "core":"replacenodetest_coll_shard2_replica1",
                  "base_url":"http://127.0.0.1:51084/x/tc",
                  "node_name":"127.0.0.1:51084_x%2Ftc",
                  "state":"active"},
                "core_node9":{
                  "core":"replacenodetest_coll_shard2_replica2",
                  "base_url":"http://127.0.0.1:51094/x/tc",
                  "node_name":"127.0.0.1:51094_x%2Ftc",
                  "state":"active",
                  "leader":"true"}}},
            "shard3":{
              "range":"e6660000-1998ffff",
              "state":"active",
              "replicas":{
                "core_node6":{
                  "core":"replacenodetest_coll_shard3_replica2",
                  "base_url":"http://127.0.0.1:51071/x/tc",
                  "node_name":"127.0.0.1:51071_x%2Ftc",
                  "state":"active"},
                "core_node8":{
                  "core":"replacenodetest_coll_shard3_replica1",
                  "base_url":"http://127.0.0.1:51065/x/tc",
                  "node_name":"127.0.0.1:51065_x%2Ftc",
                  "state":"active",
                  "leader":"true"}}},
            "shard4":{
              "range":"19990000-4ccbffff",
              "state":"active",
              "replicas":{
                "core_node3":{
                  "core":"replacenodetest_coll_shard4_replica2",
                  "base_url":"http://127.0.0.1:51075/x/tc",
                  "node_name":"127.0.0.1:51075_x%2Ftc",
                  "state":"active",
                  "leader":"true"},
                "core_node7":{
                  "core":"replacenodetest_coll_shard4_replica1",
                  "base_url":"http://127.0.0.1:51089/x/tc",
                  "node_name":"127.0.0.1:51089_x%2Ftc",
                  "state":"active"}}},
            "shard5":{
              "range":"4ccc0000-7fffffff",
              "state":"active",
              "replicas":{
                "core_node4":{
                  "core":"replacenodetest_coll_shard5_replica1",
                  "base_url":"http://127.0.0.1:51084/x/tc",
                  "node_name":"127.0.0.1:51084_x%2Ftc",
                  "state":"active",
                  "leader":"true"},
                "core_node10":{
                  "core":"replacenodetest_coll_shard5_replica2",
                  "base_url":"http://127.0.0.1:51094/x/tc",
                  "node_name":"127.0.0.1:51094_x%2Ftc",
                  "state":"active"}}}},
          "router":{"name":"compositeId"},
          "maxShardsPerNode":"3",
          "autoAddReplicas":"false"}
        

        On the redundant part:

        In this patch I guess the purpose of adding logging to OverseerCollectionMessageHandler#deleteReplica and OverseerCollectionMessageHandler#addReplica was to be able to see when replicanode/deletenode calles addReplica and deleteReplica.

        That log line in ReplaceNodeCmd should be like - "calling addReplica for collection=[] shards=[] on node=[]" . We shouldn't add a generic line to OverseerCollectionMessageHandler#addReplica . That will make anyone use AddReplica directly to see the same entry being logged twice ( the overseer logs it automatically and second one is the new line that is added in this patch) . Also in this patch its logged as log.info("going to create replica {}", Utils.toJSONString(sourceReplica)); which is tougher to read then the approach mentioned above.

        Same goes with the DeleteReplaceNodeCmd

        The tests will fail on our slower Jenkins. We should modify the hardcoded 200 count for loop with an infinite loop. if the command doesn't complete the test suite will fail which is fine.

        Show
        varunthacker Varun Thacker added a comment - Patch looks good! Here are a few comments on that patch Any reason why we needed to change modify AsyncCollectionAdminRequest#setAsyncId ? CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of Javadocs explaining what the parameters are? In CollectionParams, should REPLACENODE have lock level as none? What if we are replicaing a node and someone fires a split shard request? To be correct we need to lock right? In OverseerCollectionMessageHandler#addReplica can't we make node and coreName final to start with? final String fnode = node; final String fcoreName = coreName; Can the tests please extend SolrCloudTestCase ? On my machine it takes one min for RepliceNodeTest to run. I'd guess it will be a lot faster when its using SolrCloudTestCase The logging in this patch needs to be improved. It's too verbose and redundant. For example these log line is so verbose and adds little value - 42134 INFO (TEST-ReplaceNodeTest.test-seed#[F960E2C81499AC90]) [ ] o.a.s.c.ReplaceNodeTest excluded_node : 127.0.0.1:51079_x%2Ftc coll_state : { "replicationFactor" : "2" , "shards" :{ "shard1" :{ "range" : "80000000-b332ffff" , "state" : "active" , "replicas" :{ "core_node1" :{ "core" : "replacenodetest_coll_shard1_replica2" , "base_url" : "http: //127.0.0.1:51075/x/tc" , "node_name" : "127.0.0.1:51075_x%2Ftc" , "state" : "active" , "leader" : " true " }, "core_node5" :{ "core" : "replacenodetest_coll_shard1_replica1" , "base_url" : "http: //127.0.0.1:51089/x/tc" , "node_name" : "127.0.0.1:51089_x%2Ftc" , "state" : "active" }}}, "shard2" :{ "range" : "b3330000-e665ffff" , "state" : "active" , "replicas" :{ "core_node2" :{ "core" : "replacenodetest_coll_shard2_replica1" , "base_url" : "http: //127.0.0.1:51084/x/tc" , "node_name" : "127.0.0.1:51084_x%2Ftc" , "state" : "active" }, "core_node9" :{ "core" : "replacenodetest_coll_shard2_replica2" , "base_url" : "http: //127.0.0.1:51094/x/tc" , "node_name" : "127.0.0.1:51094_x%2Ftc" , "state" : "active" , "leader" : " true " }}}, "shard3" :{ "range" : "e6660000-1998ffff" , "state" : "active" , "replicas" :{ "core_node6" :{ "core" : "replacenodetest_coll_shard3_replica2" , "base_url" : "http: //127.0.0.1:51071/x/tc" , "node_name" : "127.0.0.1:51071_x%2Ftc" , "state" : "active" }, "core_node8" :{ "core" : "replacenodetest_coll_shard3_replica1" , "base_url" : "http: //127.0.0.1:51065/x/tc" , "node_name" : "127.0.0.1:51065_x%2Ftc" , "state" : "active" , "leader" : " true " }}}, "shard4" :{ "range" : "19990000-4ccbffff" , "state" : "active" , "replicas" :{ "core_node3" :{ "core" : "replacenodetest_coll_shard4_replica2" , "base_url" : "http: //127.0.0.1:51075/x/tc" , "node_name" : "127.0.0.1:51075_x%2Ftc" , "state" : "active" , "leader" : " true " }, "core_node7" :{ "core" : "replacenodetest_coll_shard4_replica1" , "base_url" : "http: //127.0.0.1:51089/x/tc" , "node_name" : "127.0.0.1:51089_x%2Ftc" , "state" : "active" }}}, "shard5" :{ "range" : "4ccc0000-7fffffff" , "state" : "active" , "replicas" :{ "core_node4" :{ "core" : "replacenodetest_coll_shard5_replica1" , "base_url" : "http: //127.0.0.1:51084/x/tc" , "node_name" : "127.0.0.1:51084_x%2Ftc" , "state" : "active" , "leader" : " true " }, "core_node10" :{ "core" : "replacenodetest_coll_shard5_replica2" , "base_url" : "http: //127.0.0.1:51094/x/tc" , "node_name" : "127.0.0.1:51094_x%2Ftc" , "state" : "active" }}}}, "router" :{ "name" : "compositeId" }, "maxShardsPerNode" : "3" , "autoAddReplicas" : " false " } On the redundant part: In this patch I guess the purpose of adding logging to OverseerCollectionMessageHandler#deleteReplica and OverseerCollectionMessageHandler#addReplica was to be able to see when replicanode/deletenode calles addReplica and deleteReplica. That log line in ReplaceNodeCmd should be like - "calling addReplica for collection=[] shards=[] on node=[]" . We shouldn't add a generic line to OverseerCollectionMessageHandler#addReplica . That will make anyone use AddReplica directly to see the same entry being logged twice ( the overseer logs it automatically and second one is the new line that is added in this patch) . Also in this patch its logged as log.info("going to create replica {}", Utils.toJSONString(sourceReplica)); which is tougher to read then the approach mentioned above. Same goes with the DeleteReplaceNodeCmd The tests will fail on our slower Jenkins. We should modify the hardcoded 200 count for loop with an infinite loop. if the command doesn't complete the test suite will fail which is fine.
        Hide
        noble.paul Noble Paul added a comment -

        Thanks Varun Thacker

        CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of Javadocs explaining what the parameters are?

        sure.

        Any reason why we needed to change modify AsyncCollectionAdminRequest#setAsyncId ?

        Yeah. Keeping that abstract was bad. That is deprecated

        In OverseerCollectionMessageHandler#addReplica can't we make node and coreName final to start with?

        No, they are modified in between

        The logging in this patch needs to be improved. It's too verbose and redundant.

        Logging is screwed up. I added them for my testing. I shall clean them up.

        Show
        noble.paul Noble Paul added a comment - Thanks Varun Thacker CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of Javadocs explaining what the parameters are? sure. Any reason why we needed to change modify AsyncCollectionAdminRequest#setAsyncId ? Yeah. Keeping that abstract was bad. That is deprecated In OverseerCollectionMessageHandler#addReplica can't we make node and coreName final to start with? No, they are modified in between The logging in this patch needs to be improved. It's too verbose and redundant. Logging is screwed up. I added them for my testing. I shall clean them up.
        Hide
        noble.paul Noble Paul added a comment -

        In CollectionParams, should REPLACENODE have lock level as none? What if we are replicaing a node and someone fires a split shard request? To be correct we need to lock right?

        REPLACENNODE just adds/removes replicas, so. it is no big deal. The operation happens across collections, so we have no idea what to lock here

        Show
        noble.paul Noble Paul added a comment - In CollectionParams, should REPLACENODE have lock level as none? What if we are replicaing a node and someone fires a split shard request? To be correct we need to lock right? REPLACENNODE just adds/removes replicas, so. it is no big deal. The operation happens across collections, so we have no idea what to lock here
        Hide
        varunthacker Varun Thacker added a comment - - edited

        REPLACENNODE just adds/removes replicas, so. it is no big deal. The operation happens across collections, so we have no idea what to lock here

        Won't we need to block updates to all shards of all collections that have replicas in the source node? What if we are doing a split shard - Replace node might copy the wrong data?

        Show
        varunthacker Varun Thacker added a comment - - edited REPLACENNODE just adds/removes replicas, so. it is no big deal. The operation happens across collections, so we have no idea what to lock here Won't we need to block updates to all shards of all collections that have replicas in the source node? What if we are doing a split shard - Replace node might copy the wrong data?
        Hide
        noble.paul Noble Paul added a comment -

        We would need a node level locking, ideally. But, we don't have such a thing now

        Show
        noble.paul Noble Paul added a comment - We would need a node level locking, ideally. But, we don't have such a thing now
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ae60c74f8c6ea2f62e1870802accbcd73bbfdc48 in lucene-solr's branch refs/heads/master from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ae60c74 ]

        SOLR-9320: A REPLACENODE command to decommission an existing node with another new node and SOLR-9318 the DELETENODE command that deletes all replicas in a node

        Show
        jira-bot ASF subversion and git services added a comment - Commit ae60c74f8c6ea2f62e1870802accbcd73bbfdc48 in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ae60c74 ] SOLR-9320 : A REPLACENODE command to decommission an existing node with another new node and SOLR-9318 the DELETENODE command that deletes all replicas in a node
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 519be6acf0866ce2f27e825b5b0035d39147af0b in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=519be6a ]

        SOLR-9320: A REPLACENODE command to decommission an existing node with another new node and SOLR-9318 the DELETENODE command that deletes all replicas in a node

        Show
        jira-bot ASF subversion and git services added a comment - Commit 519be6acf0866ce2f27e825b5b0035d39147af0b in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=519be6a ] SOLR-9320 : A REPLACENODE command to decommission an existing node with another new node and SOLR-9318 the DELETENODE command that deletes all replicas in a node
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4bd7d7fadbc72883f8ec9f2266daf0cd1f50b514 in lucene-solr's branch refs/heads/branch_6x from Noble Paul
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4bd7d7f ]

        SOLR-9320: SOLR-9318 The last commit screwed up

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4bd7d7fadbc72883f8ec9f2266daf0cd1f50b514 in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4bd7d7f ] SOLR-9320 : SOLR-9318 The last commit screwed up
        Hide
        varunthacker Varun Thacker added a comment -
        • The OverseerCollectionMessageHandler#checkRequired is not required . We already validate it earlier in CollectionsHandler
        • Fixed logging
        • Changed OverseerCollectionMessageHandler#Cmd#call to return void as we don't need to return anything . All results are passed in the namedlist.

        Noble - What do you think about this patch?

        Show
        varunthacker Varun Thacker added a comment - The OverseerCollectionMessageHandler#checkRequired is not required . We already validate it earlier in CollectionsHandler Fixed logging Changed OverseerCollectionMessageHandler#Cmd#call to return void as we don't need to return anything . All results are passed in the namedlist. Noble - What do you think about this patch?
        Hide
        varunthacker Varun Thacker added a comment -

        New patch. It has the following changes.

        • Fixed logging
        • Changed OverseerCollectionMessageHandler#Cmd#call to return void as we don't need to return anything . All results are passed in the named list.

        I am not messing with removing {[checkRequired}} from this patch as its not related. I'll commit this shortly.

        Show
        varunthacker Varun Thacker added a comment - New patch. It has the following changes. Fixed logging Changed OverseerCollectionMessageHandler#Cmd#call to return void as we don't need to return anything . All results are passed in the named list. I am not messing with removing {[checkRequired}} from this patch as its not related. I'll commit this shortly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 70d27aec83f9257da459f157acd9fc70764f7195 in lucene-solr's branch refs/heads/master from Varun Thacker
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=70d27ae ]

        SOLR-9320: Improve logging

        Show
        jira-bot ASF subversion and git services added a comment - Commit 70d27aec83f9257da459f157acd9fc70764f7195 in lucene-solr's branch refs/heads/master from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=70d27ae ] SOLR-9320 : Improve logging
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 5fc35a65c39b7fb4ca49675d3ef9cd7f9d6c0fa8 in lucene-solr's branch refs/heads/branch_6x from Varun Thacker
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5fc35a6 ]

        SOLR-9320: Improve logging

        Show
        jira-bot ASF subversion and git services added a comment - Commit 5fc35a65c39b7fb4ca49675d3ef9cd7f9d6c0fa8 in lucene-solr's branch refs/heads/branch_6x from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5fc35a6 ] SOLR-9320 : Improve logging
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            noble.paul Noble Paul
            Reporter:
            noble.paul Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development