[SOLR-13132] Improve JSON "terms" facet performance when sorted by relatedness - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 7.4, 9.0
Fix Version/s: 8.7, 9.0
Component/s: Facet Module
Labels:
None

Description

When sorting buckets by relatedness, JSON "terms" facet must calculate relatedness for every term.

The current implementation uses a standard uninverted approach (either docValues or UnInvertedField) to get facet counts over the domain base docSet, and then uses that initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for each relevant term (i.e., count > minCount?) and calculating intersection size of those sets with the domain base docSet.

Over high-cardinality fields, the overhead of per-term docSet creation and set intersection operations increases request latency to the point where relatedness sort may not be usable in practice (for my use case, even after applying the patch for SOLR-13108, for a field with ~220k unique terms per core, QTime for high-cardinality domain docSets were, e.g.: cardinality 1816684=9000ms, cardinality 5032902=18000ms).

The attached patch brings the above example QTimes down to a manageable ~300ms and ~250ms respectively. The approach calculates uninverted facet counts over domain base, foreground, and background docSets in parallel in a single pass. This allows us to take advantage of the efficiencies built into the standard uninverted FacetFieldProcessorByArray[DV|UIF]), and avoids the per-term docSet creation and set intersection overhead.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-13132_testSweep.patch
08/Apr/20 17:27
28 kB
Chris M. Hostetter
SOLR-13132.patch
10/Jan/19 21:03
43 kB
Michael Gibney
SOLR-13132-benchmarks.tgz
08/Jul/20 17:23
4 kB
Michael Gibney
SOLR-13132-with-cache.patch
28/Jan/19 16:46
129 kB
Michael Gibney
SOLR-13132-with-cache-01.patch
29/Jan/19 16:39
129 kB
Michael Gibney

Issue Links

relates to

SOLR-14477 relatedness() values can be wrong when using 'prefix'

Closed

SOLR-13807 Caching for term facet counts

Open

supercedes

SOLR-13108 RelatednessAgg ignores cacheDf, consults filterCache for every bucket/term

Open

links to

GitHub Pull Request #751

GitHub Pull Request #1231

Activity

People

Assignee:: Chris M. Hostetter

Reporter:: Michael Gibney

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 10/Jan/19 19:38

Updated:: 04/Nov/20 15:22

Resolved:: 10/Jul/20 04:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: