[SOLR-5444] Slow response on facet search, lots of facets, asking for few facets in response - ASF JIRA

Details

Type: Improvement
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.4
Fix Version/s: 4.9, 6.0
Component/s: SolrCloud
Labels:

Description

Setup

We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents loaded across 3 collections. The documents have the following fields

a_dlng_doc_sto (docvalue long)
b_dlng_doc_sto (docvalue long)
c_dstr_doc_sto (docvalue string)
timestamp_lng_ind_sto (indexed long)

d_lng_ind_sto (indexed long)
From schema.xml

    <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" stored="true" required="true" docValues="true"/>
    <dynamicField name="*_lng_ind_sto" type="long" indexed="true" stored="true"/>
    <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" stored="true" required="true" docValues="true"/>
...
    <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" docValuesFormat="Disk"/>
    <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0" docValuesFormat="Disk"/>

timestamp_lng_ind_sto decides which collection documents go into

We execute queries on the following format:

q=timestamp_lng_ind_sto:[x TO y] AND d_lng_ind_sto:(a OR b OR ... OR n)
facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0

Problem

We see very slow response-time when hitting large number of rows, spanning lots of facets, but only ask for "a few" of those facets

Concrete example of query to get some concrete numbers to look at

With x and y plus a, b ... n set to values so that

The timestamp_lng_ind_sto:[x TO y] part of the search-criteria alone hit about 1.7 billion documents (actually all in one (containing 4.5 billion docs) of the three collections - but that is not important)
The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone hit about 500000 documents
The combined search-criteria (timestamp_lng_ind_sto AND'ed with d_lng_ind_sto) hit about 200000 documents

The following graph shows responsetime as a function of <asked-for-facets> (in query)

Note that responsetime is high for "low" <asked-for-facets>, and that it increases fast (but linearly) in <asked-for-facets> up until <asked-for-facets> is somewhere inbetween 5000 (where responsetime is close to 1000 secs) and 10000 (where responsetime is about 5 secs). For values of <asked-for-facets> above 10000 responsetime stays "low" at between 1-10 secs

Looking at the code and profiling it is clear that the change to better responsetime occurs when SimpleFacets.getFacetFieldCounts changes from using getListedTermCounts to using getTermCounts.

The following image shows profiling information during a request with <asked-for-facets> at about 2000.

Note that

SimpleFacets.getListedTermCounts is used (green box)
91% of the time spent performing the query is spent in DocSetCollector-constructor (red box). During this concrete query 125000 DocSetCollection-objects are created spending 710 secs all in all. Additional investigations show that the time is spent allocating huge int-arrays for the "scratch"-int-array. Several thousands of those DocSetCollection-constructors create int-arrays at size above 1 million - that takes time, and also leaves a nice little job of the GC'er afterwards.
The actual search-part of the query takes only 0.5% (4 secs) of the combined time executing the query (blue box)

The following image shows profiling information during a request with <asked-for-facets> at about 10000

Note that

SimpleFacets.getTermCounts is used (green box)
The actual search-part of the query now takes 70% (11 secs) of the combined time executing the query (blue box)

What to do about this?

I am not sure why there are two paths that SimpleFacets.getFacetFieldCounts can take (getListedTermCounts or getTermCounts) - but I am pretty sure there is a good reason. It seems like getListedTermCounts is used when <asked-for-facets> is noticeable lower than the total number of facets hit (believe it is when <asked-for-facets> * 1.5 + 10 is below actual number of facets hit)
One solution could be to just drop the getListedTermCounts-path and always go getTermCounts, but that is probably not at good idea, because getListedTermCounts is probably there for a performance reason (in other scenarios)
The comment above DocSetCollection.scratch says
```
  // in case there aren't that many hits, we may not want a very sparse
  // bit array.  Optimistically collect the first few docs in an array
  // in case there are only a few.
  final int[] scratch;
```
The comment seems reasonable. But when we look at what values are used as "smallSetSize" for the DocSetCollection-constructor, it is always "maxDoc >> 6" (basically dividing by 64) - this value depends on maxDoc and will be high if maxDoc is high. In my case maxDoc is 50+ million a lot of the times resulting in "smallSetSize"s of 1+ million (that is not "a few"). I am very much in doubt why you want "smallSetSize" to increase as maxDoc increase - why not just always a low (fixed or something) value for "smallSetSize"? Is it ever a good idea with huge int-arrays for the "scratch"-array?
Another solution would be to never create "scratch"-arrays with size above e.g. 50
There are probably several other potential solutions

I would really want your opinion on what solution to make, so that I do not unintentionally break good performance-optimizations, just because I missed some points explaining why the code is as it is today!?

Note I have filed this as a 4.4 issues, because that is the platform I use for my tests etc. But I am sure the problem also exists on 4.5.1 (or whatever the latest 4.x release is)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Profiiling_SimpleFacets_getListedTermCounts_path.png
15/Nov/13 09:32
142 kB
Per Steffensen
Profiling_SimpleFacets_getTermCounts_path.png
15/Nov/13 09:32
128 kB
Per Steffensen
Responsetime_func_of_facets_asked_for.png
15/Nov/13 09:32
34 kB
Per Steffensen
Responsetime_func_of_facets_asked_for-Simple_DocSetCollector_fix.png
15/Nov/13 15:24
40 kB
Per Steffensen
SOLR-5444_simple_DocSetCollector_4_4_0.patch
15/Nov/13 15:26
4 kB
Per Steffensen
SOLR-5444_ExpandingIntArray_DocSetCollector_4_4_0.patch
20/Nov/13 13:42
6 kB
Per Steffensen
solr-slow-facet.png
30/Jun/15 00:47
77 kB
Arcadius Ahouansou

Issue Links

is related to

SOLR-8922 DocSetCollector can allocate massive garbage on large indexes

Resolved

Slow response on facet search, lots of facets, asking for few facets in response

Details

Description

Setup

Problem

Concrete example of query to get some concrete numbers to look at

What to do about this?

Attachments

Attachments

Issue Links

Activity

People

Dates