When doing testing for
SOLR-15795, I found that if you have an empty node when running the REPLACENODE command, then many times all replicas will be placed on that same node, even if it doesn't result in an even distribution in your cluster.
When looking at the code, it made sense. The ReplaceNodeCmd goes through a loop for every replica on the sourceNode, and uses the AssignStrategy class to assign a node for each replica, using the clusterstate. However, the clusterstate does not change between these replicas, so the most advantageous node for 1 replica, is likely going to be the most advantageous for many replicas given the same cluster state. Therefore all replicas were being scheduled for the same node in my testing.
An easy (in theory) solution is to let AssignStrategy take a list of AssignRequests in assign(), and each request in this list will account for the replicaPlacements decided for the previous requests in the list. That way, the ReplaceNodeCmd can create it's list of AssignRequests, and issue them all at once to AssignStrategy, which will come up with the optimal plan for all replicas together.
Because this is an API in assignStrategy, it will work with the new autoscaling APIs or using the legacy assign strategy.