Solr
  1. Solr
  2. SOLR-6220

Replica placement strategy for solrcloud

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Objective

      Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system

      All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during

      • collection creation
      • shard splitting
      • add replica
      • createsshard

      There are two aspects to how replicas are placed: snitch and placement.

      snitch

      How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch

      ImplicitSnitch

      This is shipped by default with Solr. user does not need to specify ImplicitSnitch in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used,
      tags provided by ImplicitSnitch

      1. cores : No:of cores in the node
      2. disk : Disk space available in the node
      3. host : host name of the node
      4. node: node name
      5. D.* : These are values available from systrem propertes. D.key means a value that is passed to the node as -Dkey=keyValue during the node startup. It is possible to use rules like D.key:expectedVal,shard:*

      Rules

      This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter "rule" . The values will be saved in the state of the collection as follows

      {
       “mycollection”:{
        “snitch”: {
            class:“ImplicitSnitch”
          }
        “rules”:[{"cores":"4-"}, 
                   {"replica":"1" ,"shard" :"*" ,"node":"*"},
                   {"disk":">100"}]
      }
      

      A rule is specified as a pseudo JSON syntax . which is a map of keys and values
      *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown

      • In each rule , shard and replica can be omitted
        • default value of replica is * means ANY or you can specify a count and an operand such as < (less than) or > (greater than)
        • and the value of shard can be a shard name or * means EACH or ** means ANY. default value is ** (ANY)
      • There should be exactly one extra condition in a rule other than shard and replica.
      • all keys other than shard and replica are called tags and the tags are nothing but values provided by the snitch for each node
      • By default certain tags such as node, host, port are provided by the system implicitly

      How are nodes picked up?

      Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says disk:100+ , nodes with more disk space are given higher preference. And if the rule is disk:100- nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority

      Fuzzy match

      Fuzzy match can be applied when strict matches fail .The values can be prefixed ~ to specify fuzziness

      example rule

       #Example requirement "use only one replica of a shard in a host if possible, if no matches found , relax that rule". 
      rack:*,shard:*,replica:<2~
      
      #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node
      disk:>100~
      

      Examples:

      #in each rack there can be max two replicas of A given shard
       rack:*,shard:*,replica:<3
      //in each rack there can be max two replicas of ANY replica
       rack:*,shard:**,replica:2
       rack:*,replica:<3
      
       #in each node there should be a max one replica of EACH shard
       node:*,shard:*,replica:1-
       #in each node there should be a max one replica of ANY shard
       node:*,shard:**,replica:1-
       node:*,replica:1-
       
      #In rack 738 and shard=shard1, there can be a max 0 replica
       rack:738,shard:shard1,replica:<1
       
       #All replicas of shard1 should go to rack 730
       shard:shard1,replica:*,rack:730
       shard:shard1,rack:730
      
       #all replicas must be created in a node with at least 20GB disk
       replica:*,shard:*,disk:>20
       replica:*,disk:>20
       disk:>20
      #All replicas should be created in nodes with less than 5 cores
      #In this ANY AND each for shard have same meaning
      replica:*,shard:**,cores:<5
      replica:*,cores:<5
      cores:<5
      #one replica of shard1 must go to node 192.168.1.2:8080_solr
      node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
      #No replica of shard1 should go to rack 738
      rack:!738,shard:shard1,replica:*
      rack:!738,shard:shard1
      #No replica  of ANY shard should go to rack 738
      rack:!738,shard:**,replica:*
      rack:!738,shard:*
      rack:!738
      

      In the collection create API all the placement rules are provided as a parameters called rule
      example:

      snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3&rule=shard:shard1,replica:,rack:!738} 
      
      1. SOLR-6220.patch
        89 kB
        Noble Paul
      2. SOLR-6220.patch
        81 kB
        Noble Paul
      3. SOLR-6220.patch
        76 kB
        Noble Paul
      4. SOLR-6220.patch
        77 kB
        Noble Paul
      5. SOLR-6220.patch
        72 kB
        Noble Paul
      6. SOLR-6220.patch
        69 kB
        Noble Paul
      7. SOLR-6220.patch
        33 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Steven Bower added a comment -

          I like where you are going with this...

          Some thoughts:

          This snitch information I think should be exposed at a more general level than just in replica placement.. One very useful area would be in query routing for datacenter/rack/server afinity..

          There seem to be several types of information the "snitch" is providing; System (cpu, dc, rack, etc..), Node (# cores, etc..), Collection (# shards, routing, etc..), Shard (# docs, # replicas, Core (disk size, etc..).. Plus ideally you'd like to have managed properties really at all levels... I wonder if a more robust framework for the snitches would be better:

          I am thinking something more hierarchical, imagine a tree where the top level elements are provided by a collection of snitches that are dynamically provided when requested...

          {
            "system" : {
              "dc" : "east"
              "rack" : 124
            },
            "collection" : {
              "name" : "foo",
              "numShards" : 5,
            }
            "..."
          }
          

          This way you could layer in lots of different information without having to re-implement functionality in different snitches or have some complex snitch class hierarchy... you'd simply plug snitches into different parts of the infrastructure (node, collection, etc..) Obviously this would need to be fleshed out a bit more..

          The idea of Snitches is somewhat in the spirit of Chef's Ohai (https://github.com/opscode/ohai) it may be beneficial to provide something similar... and/or we could lift/port some code for things it does already outside Solr

          In terms of the rule-sets.. I'm curious with the rules you describe how you'd model:

          • N shards evenly distributed between D datacenters (for this case assume N is evenly divisible by D)
          • All replicas should be in separate racks (unless there is no other choice)
          • All replicas should be on separate hosts (unless there is no other choice)
          Show
          Steven Bower added a comment - I like where you are going with this... Some thoughts: This snitch information I think should be exposed at a more general level than just in replica placement.. One very useful area would be in query routing for datacenter/rack/server afinity.. There seem to be several types of information the "snitch" is providing; System (cpu, dc, rack, etc..), Node (# cores, etc..), Collection (# shards, routing, etc..), Shard (# docs, # replicas, Core (disk size, etc..).. Plus ideally you'd like to have managed properties really at all levels... I wonder if a more robust framework for the snitches would be better: I am thinking something more hierarchical, imagine a tree where the top level elements are provided by a collection of snitches that are dynamically provided when requested... { "system" : { "dc" : "east" "rack" : 124 }, "collection" : { "name" : "foo", "numShards" : 5, } "..." } This way you could layer in lots of different information without having to re-implement functionality in different snitches or have some complex snitch class hierarchy... you'd simply plug snitches into different parts of the infrastructure (node, collection, etc..) Obviously this would need to be fleshed out a bit more.. The idea of Snitches is somewhat in the spirit of Chef's Ohai ( https://github.com/opscode/ohai ) it may be beneficial to provide something similar... and/or we could lift/port some code for things it does already outside Solr In terms of the rule-sets.. I'm curious with the rules you describe how you'd model: N shards evenly distributed between D datacenters (for this case assume N is evenly divisible by D) All replicas should be in separate racks (unless there is no other choice) All replicas should be on separate hosts (unless there is no other choice)
          Hide
          Noble Paul added a comment -

          N shards evenly distributed between D datacenters (for this case assume N is evenly divisible by D)

          This does not. Is it realy helpful to have just some shards in a DC? The design distributes every shard

          All replicas should be in separate racks (unless there is no other choice)

          All replicas should be on separate hosts (unless there is no other choice)

          not yet. Would be a nice syntax to add

          Show
          Noble Paul added a comment - N shards evenly distributed between D datacenters (for this case assume N is evenly divisible by D) This does not. Is it realy helpful to have just some shards in a DC? The design distributes every shard All replicas should be in separate racks (unless there is no other choice) All replicas should be on separate hosts (unless there is no other choice) not yet. Would be a nice syntax to add
          Hide
          Noble Paul added a comment -

          First cut with some basic tests

          Show
          Noble Paul added a comment - First cut with some basic tests
          Hide
          Noble Paul added a comment -

          All planned features included. Tests will come next

          Show
          Noble Paul added a comment - All planned features included. Tests will come next
          Hide
          Noble Paul added a comment -

          More tests .

          Show
          Noble Paul added a comment - More tests .
          Hide
          Shalin Shekhar Mangar added a comment -

          //in each node there should be a max one replica of EACH shard

          Unknown macro: {node}

          Instead of "1-" can we use regular <, <=, >, >= operators? For example "replica:<=1" to signal that a maximum of 1 replica is required and "disk:>=20" can signal that the chosen node must have more than 20GB of space

          Show
          Shalin Shekhar Mangar added a comment - //in each node there should be a max one replica of EACH shard Unknown macro: {node} Instead of "1-" can we use regular <, <=, >, >= operators? For example "replica:<=1" to signal that a maximum of 1 replica is required and "disk:>=20" can signal that the chosen node must have more than 20GB of space
          Hide
          Noble Paul added a comment - - edited

          The rules are passed as request params . example : rule=shard:*,disk=20+
          I considered using <= , >= and != . But , that means the user needs to escape = in the request. Which can badly affect the readability

          Show
          Noble Paul added a comment - - edited The rules are passed as request params . example : rule=shard:*,disk=20+ I considered using <= , >= and != . But , that means the user needs to escape = in the request. Which can badly affect the readability
          Hide
          Noble Paul added a comment -
          • Added fuzzy match option ~
          • nodes are presorted based on rules instead of randomly picking nodes
          • ImplicitSnitch can support per node system properties
          Show
          Noble Paul added a comment - Added fuzzy match option ~ nodes are presorted based on rules instead of randomly picking nodes ImplicitSnitch can support per node system properties
          Hide
          Noble Paul added a comment -

          Operators + and - replaced with < and >
          This is now feature complete.
          I'll add some more tests and commit this

          Show
          Noble Paul added a comment - Operators + and - replaced with < and > This is now feature complete. I'll add some more tests and commit this
          Hide
          Noble Paul added a comment -

          More tests

          Show
          Noble Paul added a comment - More tests
          Hide
          Shalin Shekhar Mangar added a comment -

          The latest patch doesn't compile. It has errors in OverseerCollectionProcessor. Can you please post a patch which is in sync with trunk?

          Show
          Shalin Shekhar Mangar added a comment - The latest patch doesn't compile. It has errors in OverseerCollectionProcessor. Can you please post a patch which is in sync with trunk?
          Hide
          Noble Paul added a comment -

          Updated patch to trunk

          Show
          Noble Paul added a comment - Updated patch to trunk
          Hide
          ASF subversion and git services added a comment -

          Commit 1677607 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1677607 ]

          SOLR-6220: Rule Based Replica Assignment during collection creation

          Show
          ASF subversion and git services added a comment - Commit 1677607 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1677607 ] SOLR-6220 : Rule Based Replica Assignment during collection creation
          Hide
          ASF subversion and git services added a comment -

          Commit 1677614 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1677614 ]

          SOLR-6220: setting eol style

          Show
          ASF subversion and git services added a comment - Commit 1677614 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1677614 ] SOLR-6220 : setting eol style
          Hide
          Tomás Fernández Löbbe added a comment -

          Ideally, most warnings should be fixed , but at least the one in SnitchContext:

            public SimpleSolrResponse invoke(UpdateShardHandler shardHandler,  final String url, String path, SolrParams params)
                throws IOException, SolrServerException {
              GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.GET, path, params);
              NamedList<Object> rsp = new HttpSolrClient(url, shardHandler.getHttpClient(), new BinaryResponseParser()).request(request);
              request.response.nl = rsp;
              return request.response;
            }
          

          Resource leak: '<unassigned Closeable value>' is never closed

          Show
          Tomás Fernández Löbbe added a comment - Ideally, most warnings should be fixed , but at least the one in SnitchContext : public SimpleSolrResponse invoke(UpdateShardHandler shardHandler, final String url, String path, SolrParams params) throws IOException, SolrServerException { GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.GET, path, params); NamedList< Object > rsp = new HttpSolrClient(url, shardHandler.getHttpClient(), new BinaryResponseParser()).request(request); request.response.nl = rsp; return request.response; } Resource leak: '<unassigned Closeable value>' is never closed
          Hide
          Anshum Gupta added a comment -

          This seems to have broken ant precommit.

          [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset]
          [forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:63)
          [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset]
          [forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:108)
          [forbidden-apis] Forbidden method invocation: java.lang.String#getBytes() [Uses default charset]
          [forbidden-apis]   in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:185)
          
          Show
          Anshum Gupta added a comment - This seems to have broken ant precommit . [forbidden-apis] Forbidden method invocation: java.lang. String #getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:63) [forbidden-apis] Forbidden method invocation: java.lang. String #getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:108) [forbidden-apis] Forbidden method invocation: java.lang. String #getBytes() [Uses default charset] [forbidden-apis] in org.apache.solr.cloud.rule.RuleEngineTest (RuleEngineTest.java:185)
          Hide
          ASF subversion and git services added a comment -

          Commit 1677622 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1677622 ]

          SOLR-6220: Fixes forbidden method invocation String#getBytes() in RuleEngineTest

          Show
          ASF subversion and git services added a comment - Commit 1677622 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1677622 ] SOLR-6220 : Fixes forbidden method invocation String#getBytes() in RuleEngineTest
          Hide
          ASF subversion and git services added a comment -

          Commit 1677635 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1677635 ]

          SOLR-6220: use closeable in try block

          Show
          ASF subversion and git services added a comment - Commit 1677635 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1677635 ] SOLR-6220 : use closeable in try block
          Hide
          Jessica Cheng Mallet added a comment -

          It'll also be nice to have a new collection API to modify the rule for a collection so that we can add rules for an existing collection or modify a bad rule set.

          Show
          Jessica Cheng Mallet added a comment - It'll also be nice to have a new collection API to modify the rule for a collection so that we can add rules for an existing collection or modify a bad rule set.
          Hide
          ASF subversion and git services added a comment -

          Commit 1677642 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1677642 ]

          SOLR-6220: Fix javadocs for precommit to pass

          Show
          ASF subversion and git services added a comment - Commit 1677642 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1677642 ] SOLR-6220 : Fix javadocs for precommit to pass
          Hide
          Anshum Gupta added a comment -

          That would be a good thing to have. Can you create a new JIRA for that if one doesn't already exist?

          Show
          Anshum Gupta added a comment - That would be a good thing to have. Can you create a new JIRA for that if one doesn't already exist?
          Hide
          Noble Paul added a comment -

          It's planned and I would like to piggy back on the modify collection API SOLR-5132

          Show
          Noble Paul added a comment - It's planned and I would like to piggy back on the modify collection API SOLR-5132
          Hide
          ASF subversion and git services added a comment -

          Commit 1677648 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1677648 ]

          SOLR-6220: Rule Based Replica Assignment during collection creation

          Show
          ASF subversion and git services added a comment - Commit 1677648 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1677648 ] SOLR-6220 : Rule Based Replica Assignment during collection creation
          Hide
          ASF subversion and git services added a comment -

          Commit 1677741 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1677741 ]

          SOLR-6220: Fix compile error on Java7

          Show
          ASF subversion and git services added a comment - Commit 1677741 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1677741 ] SOLR-6220 : Fix compile error on Java7
          Hide
          ASF subversion and git services added a comment -

          Commit 1678222 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1678222 ]

          SOLR-6220: renamed tag names disk to freedisk, and the system property prefix to sysprop.

          Show
          ASF subversion and git services added a comment - Commit 1678222 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1678222 ] SOLR-6220 : renamed tag names disk to freedisk, and the system property prefix to sysprop.
          Hide
          Jessica Cheng Mallet added a comment -

          This doesn't seem to handle addReplica. I think it'd be nice to merge in the logic of the Assign class and get rid of it completely so there's just one place to handle any kind of replica assignment.

          Show
          Jessica Cheng Mallet added a comment - This doesn't seem to handle addReplica. I think it'd be nice to merge in the logic of the Assign class and get rid of it completely so there's just one place to handle any kind of replica assignment.
          Hide
          Noble Paul added a comment -

          Jessica Cheng Mallet That's the plan

          it should be added for addReplica , createShard and splitShard

          Show
          Noble Paul added a comment - Jessica Cheng Mallet That's the plan it should be added for addReplica , createShard and splitShard
          Hide
          Shalin Shekhar Mangar added a comment -

          This was released in 5.2

          Show
          Shalin Shekhar Mangar added a comment - This was released in 5.2
          Hide
          Anshum Gupta added a comment -

          Bulk close for 5.2.0.

          Show
          Anshum Gupta added a comment - Bulk close for 5.2.0.

            People

            • Assignee:
              Noble Paul
              Reporter:
              Noble Paul
            • Votes:
              5 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development