[LUCENE-5299] Refactor Collector API for parallelism - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.1, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New, Patch Available

Description

Motivation

We should be able to scale-up better with Solr/Lucene by utilizing multiple CPU cores, and not have to resort to scaling-out by sharding (with all the associated distributed system pitfalls) when the index size does not warrant it.

Presently, IndexSearcher has an optional constructor arg for an ExecutorService, which gets used for searching in parallel for call paths where one of the TopDocCollector's is created internally. The per-atomic-reader search happens in parallel and then the TopDocs/TopFieldDocs results are merged with locking around the merge bit.

However there are some problems with this approach:

If arbitary Collector args come into play, we can't parallelize. Note that even if ultimately results are going to a TopDocCollector it may be wrapped inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
The special-casing with parallelism baked on top does not scale, there are many Collector's that could potentially lend themselves to parallelism, and special-casing means the parallelization has to be re-implemented if a different permutation of collectors is to be used.

Proposal

A refactoring of collectors that allows for parallelization at the level of the collection protocol.

Some requirements that should guide the implementation:

easy migration path for collectors that need to remain serial
the parallelization should be composable (when collectors wrap other collectors)
allow collectors to pick the optimal solution (e.g. there might be memory tradeoffs to be made) by advising the collector about whether a search will be parallelized, so that the serial use-case is not penalized.
encourage use of non-blocking constructs and lock-free parallelism, blocking is not advisable for the hot-spot of a search, besides wasting pooled threads.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-5299.patch
21/Oct/13 16:28
217 kB
Shikhar Bhushan
benchmarks.txt
21/Oct/13 16:28
9 kB
Shikhar Bhushan
LUCENE-5299.patch
21/Oct/13 22:14
257 kB
Shikhar Bhushan
LUCENE-5299.patch
22/Oct/13 02:41
258 kB
Shikhar Bhushan
LUCENE-5299.patch
24/Oct/13 02:46
267 kB
Shikhar Bhushan
LUCENE-5299.patch
27/Oct/13 17:24
287 kB
Shikhar Bhushan

Issue Links

is superceded by

LUCENE-6294 Generalize how IndexSearcher parallelizes collection execution

Closed

relates to

SOLR-5372 SolrIndexSearcher should support propagating an ExecutorService upto IndexSearcher constructor

Open

Activity

People

Assignee:: Unassigned

Reporter:: Shikhar Bhushan

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/Oct/13 15:59

Updated:: 28/Aug/22 13:55

Resolved:: 27/Feb/15 17:53