[SOLR-6810] Faster searching limited but high rows across many shards all with many hits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: search
Labels:
- distributed_search
- performance

Description

Searching "limited but high rows across many shards all with many hits" is slow
E.g.

Query from outside client: q=something&rows=1000
Resulting in sub-requests to each shard something a-la this
- 1) q=something&rows=1000&fl=id,score
- 2) Request the full documents with ids in the global-top-1000 found among the top-1000 from each shard

What does the subject mean

"limited but high rows" means 1000 in the example above
"many shards" means 200-1000 in our case
"all with many hits" means that each of the shards have a significant number of hits on the query
The problem grows on all three factors above

Doing such a query on our system takes between 5 min to 1 hour - depending on a lot of things. It ought to be much faster, so lets make it.

Profiling show that the problem is that it takes lots of time to access the store to get id’s for (up to) 1000 docs (value of rows parameter) per shard. Having 1000 shards its up to 1 mio ids that has to be fetched. There is really no good reason to ever read information from store for more than the overall top-1000 documents, that has to be returned to the client.

For further detail see mail-thread "Slow searching limited but high rows across many shards all with high hits" started 13/11-2014 on dev@lucene.apache.org

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-6810-trunk.patch
09/Feb/15 18:26
94 kB
Shalin Shekhar Mangar
SOLR-6810-trunk.patch
08/Apr/15 08:49
102 kB
Shalin Shekhar Mangar
SOLR-6810-trunk.patch
29/Apr/15 18:56
104 kB
Shalin Shekhar Mangar
SOLR-6810-hack-eoe.patch
19/Jul/16 23:21
3 kB
Erick Erickson
branch_5x_rev1645549.patch
16/Dec/14 15:33
91 kB
Per Steffensen
branch_5x_rev1642874.patch
03/Dec/14 08:35
89 kB
Per Steffensen
branch_5x_rev1642874.patch
04/Dec/14 10:52
91 kB
Per Steffensen

Issue Links

is related to

SOLR-5611 When documents are uniformly distributed over shards, enable returning approximated results in distributed query

Reopened

relates to

SOLR-6813 distrib.singlePass does not work for expand-request - start/rows included

Open

SOLR-6795 distrib.singlePass returns score even though not asked for

Closed

SOLR-6796 distrib.singlePass does not return correct set of fields for multi-fl-parameter requests

Closed

SOLR-6812 distrib.singlePass does not work for expand-request

Closed

SOLR-5478 Optimization: Fetch all "fl" values from docValues instead of stored values if possible/equivalent

Closed

(1 relates to)

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Per Steffensen

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 01/Dec/14 15:08

Updated:: 24/May/23 06:00