[SOLR-13399] compositeId support for shard splitting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 8.3
Component/s: None
Labels:
None

Description

Shard splitting does not currently have a way to automatically take into account the actual distribution (number of documents) in each hash bucket created by using compositeId hashing.

We should probably add a parameter splitByPrefix to the SPLITSHARD command that would look at the number of docs sharing each compositeId prefix and use that to create roughly equal sized buckets by document count rather than just assuming an equal distribution across the entire hash range.

Like normal shard splitting, we should bias against splitting within hash buckets unless necessary (since that leads to larger query fanout.) . Perhaps this warrants a parameter that would control how much of a size mismatch is tolerable before resorting to splitting within a bucket. allowedSizeDifference?

To more quickly calculate the number of docs in each bucket, we could index the prefix in a different field. Iterating over the terms for this field would quickly give us the number of docs in each (i.e lucene keeps track of the doc count for each term already.) Perhaps the implementation could be a flag on the id field... something like indexPrefixes and poly-fields that would cause the indexing to be automatically done and alleviate having to pass in an additional field during indexing and during the call to SPLITSHARD. This whole part is an optimization though and could be split off into its own issue if desired.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt
08/Aug/19 21:33
3.76 MB
Chris M. Hostetter
SOLR-13399_useId.patch
03/Aug/19 21:53
12 kB
Yonik Seeley
SOLR-13399_testfix.patch
29/Jul/19 18:53
5 kB
Yonik Seeley
SOLR-13399.patch
18/Jul/19 14:39
41 kB
Yonik Seeley
SOLR-13399.patch
10/Jul/19 18:07
25 kB
Yonik Seeley

Issue Links

links to

GitHub Pull Request #826

GitHub Pull Request #903

Activity

People

Assignee:: Yonik Seeley

Reporter:: Yonik Seeley

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Apr/19 15:06

Updated:: 30/Sep/19 16:53

Resolved:: 27/Sep/19 17:47

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: