Solr
  1. Solr
  2. SOLR-6513

Add a collectionsAPI call BALANCESLICEUNIQUE

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Another sub-task for SOLR-6491. The ability to assign a property on a node-by-node basis is nice, but tedious to get right for a sysadmin, especially if there are, say, 100s of nodes hosting a system. This JIRA would essentially provide an automatic mechanism for assigning a property. This particular command simply changes the cluster state, it doesn't do anything like re-assign functions.

      My idea for this version is fairly limited. You'd have to specify a collection and there would be no attempt to, say, evenly distribute the preferred leader role/property for this collection by looking at other collections. Or by looking at underlying hardware capabilities. Or....

      It would be a pretty simple round-robin assignment. About the only intelligence built in would be to change as few roles/properties as possible. Let's say that the correct number of nodes for this role turned out to be 3. Any node currently having 3 properties for this collection would NOT be changed. Any node having 2 properties would have one added that would be taken from some node with > 3 properties like this.

      This probably needs an optional parameter, something like "includeInactiveNodes=true|false"

      Since this is an arbitrary property, one must specify sliceUnique=true. So for the "preferredLeader" functionality, one would specify something like:
      action=BALANCESLICEUNIQUE&property=preferredLeader&proprety.value=true.

      There are checks in this code that require the preferredLeader to have a t/f value and require that sliceUnique bet true. That said, this can be called on an arbitrary property that has only one such property per slice.

      1. SOLR-6513.patch
        56 kB
        Erick Erickson
      2. SOLR-6513.patch
        54 kB
        Erick Erickson
      3. SOLR-6513.patch
        52 kB
        Erick Erickson
      4. SOLR-6513.patch
        26 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -

          It'd probably also be good to add something to the shard when getting the clusterstatus, something akin to:

          "preferredLeaders assigned/actual":"32/30"

          that would represent the congruence (or lack thereof) of the model and reality.

          Show
          Erick Erickson added a comment - It'd probably also be good to add something to the shard when getting the clusterstatus, something akin to: "preferredLeaders assigned/actual":"32/30" that would represent the congruence (or lack thereof) of the model and reality.
          Hide
          Erick Erickson added a comment -

          I'm thinking that this should have at least one optional parameter, something like "onlyLiveNodes" which would default to "true". The idea here would be that you might want to assign preferred leader roles to all the nodes in your cluster, even if some of them were offline (temporarily one assumes).

          And what about a nodeset parameter? The idea here would be to evenly balance the preferred leaders amongst a specific set of nodes for all the shards in a collection. That would be an easy thing to implement, it just makes gathering the list of potential nodes with the role assigned manual. But frankly that doesn't seem like a great idea, one of those things that just because it's easy doesn't mean it's worthwhile. I'd need a good argument for supporting it at this point, thus I'm mentioning it.

          Show
          Erick Erickson added a comment - I'm thinking that this should have at least one optional parameter, something like "onlyLiveNodes" which would default to "true". The idea here would be that you might want to assign preferred leader roles to all the nodes in your cluster, even if some of them were offline (temporarily one assumes). And what about a nodeset parameter? The idea here would be to evenly balance the preferred leaders amongst a specific set of nodes for all the shards in a collection. That would be an easy thing to implement, it just makes gathering the list of potential nodes with the role assigned manual. But frankly that doesn't seem like a great idea, one of those things that just because it's easy doesn't mean it's worthwhile. I'd need a good argument for supporting it at this point, thus I'm mentioning it.
          Hide
          Erick Erickson added a comment -

          Not ready to commit yet, certainly have to beef up the tests. Putting it up so people can see what this is starting to look like and comment. I've done some very preliminary testing and the roles seem to be assigned as I expect, significantly more testing to do as well as looking over the code in more detail with fresh eyes.

          Certainly a question whether the new private class in Overseer.java belongs there or should be a separate file, it has no applicability outside Overseer that I'm predicting so it doesn't seem misplaced. But the Overseer.java file is now approaching 1,900 lines.

          Show
          Erick Erickson added a comment - Not ready to commit yet, certainly have to beef up the tests. Putting it up so people can see what this is starting to look like and comment. I've done some very preliminary testing and the roles seem to be assigned as I expect, significantly more testing to do as well as looking over the code in more detail with fresh eyes. Certainly a question whether the new private class in Overseer.java belongs there or should be a separate file, it has no applicability outside Overseer that I'm predicting so it doesn't seem misplaced. But the Overseer.java file is now approaching 1,900 lines.
          Hide
          Shalin Shekhar Mangar added a comment -

          What's wrong with an API for doing a specific thing like auto re-balance leaders? This sounds way too complicated.

          Show
          Shalin Shekhar Mangar added a comment - What's wrong with an API for doing a specific thing like auto re-balance leaders? This sounds way too complicated.
          Hide
          Erick Erickson added a comment -

          Shalin:

          The biggest bit here is that by having a property and inserting itself at the head of the list for leader election, the cluster will tend to self-correct. That is, every time a leader election occurs, the replica with the property should be at the head of the list (assuming it's active) and become the leader. Most of the time, this will probably be "good enough", but in the pathological case of things getting really out of whack, hitting the "relect leaders" API call (SOLR-6517) will re-distribute leadership.

          So one scenario is to set up a cluster, call this API and not have to redistribute the leaders periodically via SOLR-6517.

          Additionally, as discussed in SOLR-6491, there's no real substitute for an operations person's knowledge of the system. Consider a cluster of heterogeneous machines hosting multiple collections, some of which have heavy traffic and some of which do not. Even creating a way to characterize that kind of setup in a general way such that having only a "rebalance your leaders now" command would do the right thing hurts my head.

          Show
          Erick Erickson added a comment - Shalin: The biggest bit here is that by having a property and inserting itself at the head of the list for leader election, the cluster will tend to self-correct. That is, every time a leader election occurs, the replica with the property should be at the head of the list (assuming it's active) and become the leader. Most of the time, this will probably be "good enough", but in the pathological case of things getting really out of whack, hitting the "relect leaders" API call ( SOLR-6517 ) will re-distribute leadership. So one scenario is to set up a cluster, call this API and not have to redistribute the leaders periodically via SOLR-6517 . Additionally, as discussed in SOLR-6491 , there's no real substitute for an operations person's knowledge of the system. Consider a cluster of heterogeneous machines hosting multiple collections, some of which have heavy traffic and some of which do not. Even creating a way to characterize that kind of setup in a general way such that having only a "rebalance your leaders now" command would do the right thing hurts my head.
          Hide
          Erick Erickson added a comment -

          By assigning the property, the cluster will tend toward the desired configuration without any "reassign leader now" command simply because each node with the property will put itself at the head of the list for leader election. So having an auto-assignment that does this is an easy way to set up a cluster that stays balanced "well enough most of the time". Then the "reassign leaders now" command can be used when the system gets out of whack.

          It's really tedious to assign, say, 100+ shards manually .

          Additionally, (see the discussion at SOLR-6491) consider a heterogeneous network of machines of varying capacities hosting many collections. Building a tool to auto-rebalance them all is a R&D project . Absent that, having a preferred leader property that can be assigned allows the operators to incorporate their knowledge of the hardware, loads (i.e. one collection may be much hotter than others) etc.

          Given the need for manual assignment, using the same mechanism for optionally assigning the property automatically is actually somewhat simpler overall, at least to me.

          Show
          Erick Erickson added a comment - By assigning the property, the cluster will tend toward the desired configuration without any "reassign leader now" command simply because each node with the property will put itself at the head of the list for leader election. So having an auto-assignment that does this is an easy way to set up a cluster that stays balanced "well enough most of the time". Then the "reassign leaders now" command can be used when the system gets out of whack. It's really tedious to assign, say, 100+ shards manually . Additionally, (see the discussion at SOLR-6491 ) consider a heterogeneous network of machines of varying capacities hosting many collections. Building a tool to auto-rebalance them all is a R&D project . Absent that, having a preferred leader property that can be assigned allows the operators to incorporate their knowledge of the hardware, loads (i.e. one collection may be much hotter than others) etc. Given the need for manual assignment, using the same mechanism for optionally assigning the property automatically is actually somewhat simpler overall, at least to me.
          Hide
          Erick Erickson added a comment -

          Preliminary patch. I'm chasing down a failing test before I put this on the review board, but if anyone wants an advance peek here it is.

          Show
          Erick Erickson added a comment - Preliminary patch. I'm chasing down a failing test before I put this on the review board, but if anyone wants an advance peek here it is.
          Hide
          Erick Erickson added a comment -

          Here's what I hope is the final patch. I'll commit this in a day or two unless there are objections.

          I ran about 500 iterations last night. NOTE: I can certainly see situations where the network topology and/or the algorithm produce sub-optimal distributions. From my testing so far, however, I haven't seen this happen. Of course the numbers of shards/replicas are pretty small all things considered.

          Worst case is probably that in some situations, one manually assigns a sliceUnique property to certain nodes. The code preserves any manual assignments assuming that the total of unique properties assigned to a particular node is < (num slices)/(num nodes) + 1.

          Reviewboard here: https://reviews.apache.org/r/26374/

          Show
          Erick Erickson added a comment - Here's what I hope is the final patch. I'll commit this in a day or two unless there are objections. I ran about 500 iterations last night. NOTE: I can certainly see situations where the network topology and/or the algorithm produce sub-optimal distributions. From my testing so far, however, I haven't seen this happen. Of course the numbers of shards/replicas are pretty small all things considered. Worst case is probably that in some situations, one manually assigns a sliceUnique property to certain nodes. The code preserves any manual assignments assuming that the total of unique properties assigned to a particular node is < (num slices)/(num nodes) + 1. Reviewboard here: https://reviews.apache.org/r/26374/
          Hide
          Erick Erickson added a comment -

          Final patch with CHANGES.txt. Also insures that custom properties are preserved across state changes.

          Show
          Erick Erickson added a comment - Final patch with CHANGES.txt. Also insures that custom properties are preserved across state changes.
          Hide
          ASF subversion and git services added a comment -

          Commit 1630143 from Erick Erickson in branch 'dev/trunk'
          [ https://svn.apache.org/r1630143 ]

          SOLR-6513: Add a collectionsAPI call BALANCESLICEUNIQUE

          Show
          ASF subversion and git services added a comment - Commit 1630143 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1630143 ] SOLR-6513 : Add a collectionsAPI call BALANCESLICEUNIQUE
          Hide
          ASF subversion and git services added a comment -

          Commit 1630191 from Erick Erickson in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1630191 ]

          SOLR-6513: Add a collectionsAPI call BALANCESLICEUNIQUE

          Show
          ASF subversion and git services added a comment - Commit 1630191 from Erick Erickson in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1630191 ] SOLR-6513 : Add a collectionsAPI call BALANCESLICEUNIQUE
          Hide
          Erick Erickson added a comment -

          Adding to CWiki shortly.

          Show
          Erick Erickson added a comment - Adding to CWiki shortly.
          Hide
          Jan Høydahl added a comment -

          I thought we agreed to prefer the term "shard" over "slice", so I think we should do this for this API as well.

          The only place in our refguide we use the word "slice" is in How SolrCloud Works [1] and that description is disputed.

          The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting:

          When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

          So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1].

          [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
          [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

          Show
          Jan Høydahl added a comment - I thought we agreed to prefer the term "shard" over "slice", so I think we should do this for this API as well. The only place in our refguide we use the word "slice" is in How SolrCloud Works [1] and that description is disputed. The refguide explanation of what a shard is can be found in Shards and Indexing Data in SolrCloud [2], quoting: When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index. So I'm proposing a rename of this API to BALANCESHARDUNIQUE and a rewrite of [1]. [1] https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works [2] https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
          Hide
          Mark Miller added a comment -

          The general way of things has been to use slice in code and shard+context in user facing things. There has never been real agreement any of these issues IMO though. Not even when just two of us worked on it.

          Show
          Mark Miller added a comment - The general way of things has been to use slice in code and shard+context in user facing things. There has never been real agreement any of these issues IMO though. Not even when just two of us worked on it.
          Hide
          Erick Erickson added a comment -

          Still, I'm all for keeping things consistent. See SOLR-6670 and we'll go from there.

          Show
          Erick Erickson added a comment - Still, I'm all for keeping things consistent. See SOLR-6670 and we'll go from there.
          Hide
          Jan Høydahl added a comment -

          This API is not in a released version, so it should be safe to commit the rename as part of this JIRA, not?

          Show
          Jan Høydahl added a comment - This API is not in a released version, so it should be safe to commit the rename as part of this JIRA, not?
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development