[SOLR-10634] Move calculation of some aggregations to collection phase - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.7, 7.0
Component/s: Facet Module
Labels:
None

Description

From http://markmail.org/message/pwgnt7iqxkzcnckh

The current code is more optimized for finding the top K buckets from
a total of N.
When one asks to return the top 10 buckets when there are potentially
millions of buckets, it makes sense to defer calculating other metrics
for those buckets until we know which ones they are. After we
identify the top 10 buckets, we calculate the domain for that bucket
and use that to calculate the remaining metrics.

The current method is obviously much slower when one is requesting
all buckets. We might as well just calculate all metrics in the
first pass rather than trying to defer them.

So we should move aggregations from the second pass to the first pass under the following conditions:

no limit (or a high limit compared to the number of potential buckets?)
no sub-facets (or anything else) that will need the domain calculated anyway
aggregation is not really memory intensive per-slot (i.e. moving some calculations from per-bucket in the second phase, to all-buckets-in-parallel in the first phase could be really bad for peak memory usage)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-10634.patch
20/May/17 22:10
9 kB
Yonik Seeley
SOLR-10634.patch
11/May/17 15:05
3 kB
Yonik Seeley

Activity

People

Assignee:: Unassigned

Reporter:: Yonik Seeley

Votes:: 3 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 08/May/17 15:13

Updated:: 08/Jun/19 15:20

Resolved:: 24/May/17 02:16