Description
Currently, I see that code flow for a query in SolrCloud is as follows:
For distributed query:
SolrCore -> SearchHandler.handleRequestBody() -> HttpShardHandler.submit()
For non-distributed query:
SolrCore -> SearchHandler.handleRequestBody() -> QueryComponent.process()
For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards.
If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance.
We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() to fix this behavior (most likely the former and not the latter).
Attachments
Attachments
- SOLR-6832.patch
- 14 kB
- Timothy Potter
- SOLR-6832.patch
- 17 kB
- Sachin Goyal
- SOLR-6832.patch
- 10 kB
- Sachin Goyal
- SOLR-6832.patch
- 13 kB
- Sachin Goyal
Issue Links
- is required by
-
SOLR-8298 small preferLocalShards implementation refactor
- Closed
- relates to
-
SOLR-9758 refactor preferLocalShards implementation
- Resolved
-
SOLR-14035 remove deprecated preferLocalShards references
- Closed
Activity
I have concerns, but I don't want to derail the work. There are use-cases for which this would be very useful, but many other use-cases where it would cause a single machine to crumble under the load while other machines in the cloud are nearly idle.
Duplicating what I said on the dev@l.a.o thread:
Consider a SolrCloud that is handling 5000 requests per second with a replicationFactor of 20 or 30. This could be one shard or multiple shards. Currently, those requests will be load balanced to the entire cluster. If this option is implemented, suddenly EVERY request will have at least one part handled locally ... and unless the index is very tiny or 99 percent of the queries hit a Solr cache, one index core simply won't be able to handle 5000 queries per second. Getting a single machine capable of handling that load MIGHT be possible, but it would likely be VERY expensive.
This would be great as an OPTION that can be enabled when the index composition and query patterns dictate it will be beneficial ... but it definitely should not be default behavior.
Hi elyograg, I work with sachingoyal. The background of this patch is that, we have a cluster of 14 machines actually serving upwards of 5000 qps, and when one machine goes into a multi-second GC pause, it easily brings down the entire cluster. I know this is not the sole cause of the distributed deadlock and we definitely fixed other things like (gc pauses, thread counts etc) to reduce the likelihood of this problem.
In the scenario that you mention, the load balancer outside SolrCloud is at fault and when that is the case we'd like it to take down only one replica rather than propagate the problem to other replicas.
So to be clear, when this Option is ON, the only thing you'll "lose" is extra load balancing among the shard-queries. And frankly when I have all the shards in the same node, I prefer to NOT go over the network as network is among the most unreliable and taxed resource in cloud environments. When we go over the network to another compute, I have no idea what is carrying me over there and how is that other node doing overall.
We will post our results on the benefit of having this option as ON.
That sounds like a perfect use-case for this option. In your setup, you have an external load balancer and are not relying on SolrCloud itself or the zookeeper-aware Java client (CloudSolrServer) to do the load balancing for you. For an environment like that, letting SolrCloud forward the request adds a completely unnecessary network hop, along with new Java objects and subsequent garbage that must be collected.
This is why I said I didn't want to derail the work. If you have a solution, we should try to get it to a state where it can be committed. It is very clear that it will be an immense help for many users. I just don't want it to become the default.
Trying to come up with a useful and descriptive option name that's not horribly long ... that's a challenge. Something like handleRequestsLocally may be too generic, but it's a lot shorter than handleShardRequestsLocallyIfPossible!
Our clients actually do use CloudSolrServer (LB SolrJ client). Is there something we should be worrying about there? We are under the impression that the Zk aware CloudSolrServer is doing a round-robin load balancing sending query requests.
We only intend to 'preferLocalShards' on the Solr node side only.
BTW, how is the name 'preferLocalShards' ?
CloudSolrServer does load balance, so you do not need an external load balancer. Internally, it uses changes in the zookeeper clusterstate to add and remove URLs on an instance of LBHttpSolrServer, which in turn uses HttpSolrServer for each of those URLs.
The name preferLocalShards is perfect ... and I think a good case can be made for CloudSolrServer using this for queries (probably via a query URL parameter) by default.
we might even be able to shorten the parameter name to preferLocal, but that will require some further thought. I'd hate to have the shorter version be in use when another preferLocalXXX requirement comes up.
A slightly better choice might be preferLocalReplicas ... but Shards is pretty good too.
Attaching a patch for this ticket.
The patch creates an option preferLocalShards in solrconfig.xml and in the query request params (giving more preference to the one in the query).
If this option is set, HttpShardHandler.preferCurrentHostForDistributedReq() tries to find a local URL and puts that URL as the first one in the list of URLs sent to LBHttpSolrServer. This ensures that the current host's cores will be given preference for distributed queries.
Other important function is ResponseBuilder.findCurrentHostAddress() which tries to find the current host's URL by searching for current core's name in the list of shards. The URL found by this function is used by the HttpShardHandle's function above.
Default value of the option is kept as 'false' to ensure normal behavior.
sachingoyal Thanks for the patch! I'm working to get it to a committable state.
I don't think adding preferLocalShards as a collection-level setting (in SolrConfig) adds much value here. If an operator wants to enforce that query parameter for all requests, they can use the built-in support for defaults or invariants on the appropriate query request handler, e.g. to make this the default on the /select handler, you could do:
<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <bool name="preferLocalShards">true</bool> ...
Both approaches require some config changes in solrconfig.xml, but the latter (my suggestion) avoids adding new code / config settings. That said, please let me know if you think there's another reason to have this as an explicit setting in solrconfig.xml.
Also, all the code in findCurrentHostAddress can simply be replaced by ZkController.getBaseUrl() when needed.
thelabdude, I do not feel very strongly about the configuration option in solrconfig.xml
I kept it here only because specifying a global option looked simpler to use.
I will try the ZkController.getBaseUrl() and update the patch shortly with the above suggestions.
Thank you for reviewing.
Awesome - btw ... in case you haven't seen this before, it's a little cumbersome to get at the ZkController from the req object, something like:
req.getCore().getCoreDescriptor().getCoreContainer().getZkController().getBaseUrl()
thelabdude, here is a new patch with the above comments incorporated.
I have checked that
req.getCore().getCoreDescriptor().getCoreContainer().getZkController().getBaseUrl()
works well and so
ResponseBuilder. findCurrentHostAddress()
is no longer required.
Configuration change is also done.
Thanks for the updated patch. Only thing we need now is a good unit test. I can take a stab at that over the next few days.
Thanks thelabdude.
If you can point me to some existing test-case from where I can see the creation of multiple nodes' cluster and running updates/queries on the same, then I can help with unit test creation.
thelabdude, I have included a unit-test in the new patch.
This tests that response is received from local cores only when `preferLocalShards` is set to true in the query.
sachingoyal It seems like your latest patch was created / tested against branch4x vs. trunk? It's better to work against trunk for new features and then we'll back-port the changes as needed. I went ahead and migrated your patch to work with trunk and cleaned up a few places in the code. Overall looking good!
Also, I don't think we need to include this parameter in all of the configs, as we're trying to get away from bloated configs. So I changed the patch to just include in the sample techproducts configs. We'll also need to document this parameter in the Solr reference guide.
Thank you thelabdude.
Please let me know how we can get this committed into the trunk and I can edit the Solr reference guide.
I would also like to back-port this into the 5x branch.
Commit 1659748 from thelabdude in branch 'dev/trunk'
[ https://svn.apache.org/r1659748 ]
SOLR-6832: Queries be served locally rather than being forwarded to another replica
Commit 1659750 from thelabdude in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1659750 ]
SOLR-6832: Queries be served locally rather than being forwarded to another replica
The performance gain increases if coresPerMachine is > 1 and a single JVM has cores from 'k' shards.
Ever managed to measure how much this feature helps in various scenarios?
For a distributed query, the request is always sent to all the shards even if the originating SolrCore (handling the original distributed query) is a replica of one of the shards. If the original Solr-Core can check itself before sending http requests for any shard, we can probably save some network hopping and gain some performance.
This sounds as like it saves only a N local calls out of M, where M > N, N is the number of local replicas that could be queried locally, and M is the total number of primary shards in the cluster that are to be queries. Is this correct?
So say there are 20 shards spread evenly over 20 nodes (i.e., 1 shard per node) and a query request comes in, the node that got the request will query send 19 requests to the remaining 19 nodes and thus save just one network trip by querying a local shard? I must be missing something...
@Otis, you are correct. This helps only where there is over-sharding. And in our particular scenario where we sharded to get better CPU core utilization and write speeds based on Tim's experiments with over-sharding. Since all queries were send to other nodes, we were getting hit with distributed deadlocks more often when one or more nodes were slow/overloaded.
So this patch is a slight optimization and a reduction of likelihood of getting bogged down by other slow nodes when the parent query node has the core.
Hmmmm.... didn't examine the patch or tried this functionality, but based on your description... here are some comments.
This helps only where there is over-sharding.
That in itself should be avoided whenever possible in my experience. Overhead around memory and communication during querying. Could be related to your deadlocks. Or maybe you do a ton more writes so distributing writes across all nodes is worth the query-time overhead of over-sharding?
Since all queries were send to other nodes, we were getting hit with distributed deadlocks more often when one or more nodes were slow/overloaded.
Hmmmm... if that is truly happening, then isn't that a separate issue to be fixed?
So this patch is a slight optimization and a reduction of likelihood of getting bogged down by other slow nodes when the parent query node has the core.
But VERY slight, right? (hence my Q about whether you've quantified the improvement from this patch)
Intuitively, querying the local data makes sense - why would one not do that if the data is right there. I just wonder how much you really benefit if you are saving just 1 (or very small) number of network calls in request that ends up dispatching NN requests to NN other nodes in the cluster.
Actually, in our experience, network has been the most flaky piece. So any network hop saved is a big deal.
And again you are right that the root cause (first domino) of the distributed deadlock is yet to be identified. What we see is when 1 machine in the cluster goes for a GC pause or traffic spike, it brings down all the other machines be quickly. The slow machine currently does not tell ZK that its struggling and hence all other nodes keep sending it queries. This is being addressed in another JIRA.
This particular patch buys us some time.
https://issues.apache.org/jira/browse/SOLR-7121 is a patch to address the other part of this problem.
It helps nodes become aware of their slowness and tell the ZK that they should be moved out of the network for a while.
When their health has recovered, the nodes automatically request the ZK to be joined back in the cluster.
These two patches have resulted in making our cluster stable, though we have yet to quantify by how much (Quantification is not really a priority right now given that we will need to compare the cluster with an un-patched cluster and then put load on them to bring them down etc.)
The performance gain increases if coresPerMachine is > 1 and a single JVM has cores from 'k' shards.