[SOLR-8146] Allowing SolrJ CloudSolrClient to have preferred replica for query/read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 5.3
Fix Version/s: 7.4, 8.0
Component/s: clients - java
Labels:
None

Description

Backgrouds

Currently, the CloudSolrClient randomly picks a replica to query.
This is done by shuffling the list of live URLs to query then, picking the first item from the list.

This ticket is to allow more flexibility and control to some extend which URLs will be picked up for queries.

Note that this is for queries only and would not affect update/delete/admin operations.

Implementation

The current patch uses regex pattern and moves to the top of the list of URLs only those matching the given regex specified by the system property

solr.preferredQueryNodePattern

Initially, I thought it may be good to have Solr nodes tagged with a string pattern (snitch?) and use that pattern for matching the URLs.

Any comment, recommendation or feedback would be appreciated.

Use Cases

There are many cases where the ability to choose the node where queries go can be very handy:

Special node for manual user queries and analytics

One may have a SolrCLoud cluster where every node host the same set of collections with:

multiple large SolrCLoud nodes (L) used for production apps and
have 1 small node (S) in the same cluster with less ram/cpu used only for manual user queries, data export and other production issue investigation.

This ticket would allow to configure the applications using SolrJ to query only the (L) nodes

This use case is similar to the one described in ~~SOLR-5501~~ raised by manuel lenormand

Minimizing network traffic

For simplicity, let's say that we have a SolrSloud cluster deployed on 2 (or N) separate racks: rack1 and rack2.

On each rack, we have a set of SolrCloud VMs as well as a couple of client VMs querying solr using SolrJ.

All solr nodes are identical and have the same number of collections.

What we would like to achieve is:

clients on rack1 will by preference query only SolrCloud nodes on rack1, and
clients on rack2 will by preference query only SolrCloud nodes on rack2.
Cross-rack read will happen if and only if one of the racks has no available Solr node to serve a request.

In other words, we want read operations to be local to a rack whenever possible.

Note that write/update/delete/admin operations should not be affected.

Note that in our use case, we have a cross DC deployment. So, replace rack1/rack2 by DC1/DC2

Any comment would be very appreciated.

Thanks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-8146.patch
08/Oct/15 23:47
7 kB
Arcadius Ahouansou
SOLR-8146.patch
10/Oct/15 22:36
7 kB
Arcadius Ahouansou
SOLR-8146.patch
10/Nov/15 07:54
17 kB
Arcadius Ahouansou
SOLR-8146.patch
29/Sep/16 14:22
7 kB
Noble Paul

Issue Links

duplicates

SOLR-6205 Make SolrCloud Data-center, rack or zone aware

Resolved

SOLR-12217 Add support for shards.preference in SolrJ for single shard cases

Closed

is duplicated by

SOLR-11982 Add support for indicating preferred replica types for queries

Closed

is related to

SOLR-12217 Add support for shards.preference in SolrJ for single shard cases

Closed

SOLR-11982 Add support for indicating preferred replica types for queries

Closed

relates to

SOLR-8522 ImplicitSnitch to support IPv4 fragment tags

Resolved

links to

GitHub Pull Request #66

GitHub Pull Request #147

(1 relates to, 3 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Arcadius Ahouansou

Votes:: 7 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 08/Oct/15 23:39

Updated:: 15/Jul/20 00:55

Resolved:: 15/Jul/20 00:34

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m