Been poking around the SolrCloud/zk code ... fun times.
From what i can tell, we don't record anywhere in zookeeper the mapping of "nodeName" -> "baseURL" for the various solr nodes in a solr cloud cluster. We do seem evidently record the baseUrl associated with a nodeName in the info about each replica – but that information is per collection & shard, so as is it doesn't really help in the general case of the bad code in OverseerCollectionProcessor.
Three options occur to me...
1) We could consider adding these mappins to ZK as 1st order info. possibly by adding some data to the ephemeral "liveNodes" path for each node, so code like OverseerCollectionProcessor could just ask for the data of each liveNode to know it's baseUrl ... but i'm not sure how far down that rabithole we want to go (i don't really know the performance characteristics of ZK enough to know if it's a good idea to have code doing lots of those kinds of lookups ad-hoc)
2) we could cheat: we could add something like this to ClusterState...
private final Map<String,String> baseUrls;
public String getBaseUrl(final String nodeName);
...and populate the baseUrls Map in the constructor based on the properties found when looping over every collections->slice->replica. The only question is what to do if/when two diff collections/slice/replica in the clusterstate disagree about the baseUrl? (assertion failed?)
3) We could improve the kludge to be a bit less kludgy: OverseerCollectionProcessor (and possibly other places) currently assume that a baseUrl can be computed from a nodeName by replacing all "_" with "/" – if we change that substitution to only apply to the first "_" in the nodeName, and combine it with some URL decoding on the "hostContext" portion of the nodeName (to match my suggested improvement in theprevious patch) i think we would have a fairly safe way of bi-directinally converting nodeName<->URL regardless of what's in the hostContext - because hostnames and ports can't ever have "_" in them. (this wouldn't address the "http://" kludge, but that assumption seems to be more pervasive - we can fight that battle another day)
Option #3 seems the invasive for now, so unless mark/yonik/sami/somebody chimes in with more encouragment to go down one of the other routes, i'll take a stab at #3 and see what other problems i encounter.