[LUCENE-2362] Add support for slow filters with batch processing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.1
Fix Version/s: None
Component/s: core/search
Labels:
- batch
- filter
- perfomance
- search

Lucene Fields:

New

Description

Internal implementation of IndexSearch assumes that Filter and scorer has almost equal perfomance. But in our environment we have Filter implementation that is very expensive (in compare to scorer).

if we have, let's say, 2k of termdocs selected by scorer (each ~250 docs) and 2k selected by filter, then 250k docs will be fastly checked (and filtered out) by scorer, and 250k docs will be slowly checked by our filter.

Using straigthforward implementation makes search out of 60 seconds per query boundary, because each next() or advance() requires N queries to database PER CHECKED DOC. Using read ahead technique allows us to optimze it to 35 seconds per query. Still too slow.

The solution to problem is firstly select all documents by scorer and filter them in batch by our filter. Example of implementation (with BitSet) in attachement. Currently it takes only ~300 millseconds per query.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

IndexSearcherImpl.java
03/Apr/10 12:03
3 kB
Sergey Vladimirov
BatchFilter.java
02/Apr/10 19:36
1 kB
Sergey Vladimirov

Activity

People

Assignee:: Unassigned

Reporter:: Sergey Vladimirov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 02/Apr/10 19:33

Updated:: 28/Aug/22 12:23