[LUCENE-1594] Use source code specialization to maximize search performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: core/search
Labels:
None

Lucene Fields:

New

Description

Towards eeking absolute best search performance, and after seeing the
Java ghosts in ~~LUCENE-1575~~, I decided to build a simple prototype
source code specializer for Lucene's searches.

The idea is to write dynamic Java code, specialized to run a very
specific query context (eg TermQuery, collecting top N by field, no
filter, no deletions), compile that Java code, and run it.

Here're the performance gains when compared to trunk:

Query	Sort	Filt	Deletes	Scoring	Hits	QPS (base)	QPS (new)	%
1	Date (long)	no	no	Track,Max	2561886	6.8	10.6	55.9%
1	Date (long)	no	5%	Track,Max	2433472	6.3	10.5	66.7%
1	Date (long)	25%	no	Track,Max	640022	5.2	9.9	90.4%
1	Date (long)	25%	5%	Track,Max	607949	5.3	10.3	94.3%
1	Date (long)	10%	no	Track,Max	256300	6.7	12.3	83.6%
1	Date (long)	10%	5%	Track,Max	243317	6.6	12.6	90.9%
1	Relevance	no	no	Track,Max	2561886	11.2	17.3	54.5%
1	Relevance	no	5%	Track,Max	2433472	10.1	15.7	55.4%
1	Relevance	25%	no	Track,Max	640022	6.1	14.1	131.1%
1	Relevance	25%	5%	Track,Max	607949	6.2	14.4	132.3%
1	Relevance	10%	no	Track,Max	256300	7.7	15.6	102.6%
1	Relevance	10%	5%	Track,Max	243317	7.6	15.9	109.2%
1	Title (string)	no	no	Track,Max	2561886	7.8	12.5	60.3%
1	Title (string)	no	5%	Track,Max	2433472	7.5	11.1	48.0%
1	Title (string)	25%	no	Track,Max	640022	5.7	11.2	96.5%
1	Title (string)	25%	5%	Track,Max	607949	5.5	11.3	105.5%
1	Title (string)	10%	no	Track,Max	256300	7.0	12.7	81.4%
1	Title (string)	10%	5%	Track,Max	243317	6.7	13.2	97.0%

Those tests were run on a 19M doc wikipedia index (splitting each
Wikipedia doc @ ~1024 chars), on Linux, Java 1.6.0_10

But: it only works with TermQuery for now; it's just a start.

It should be easy for others to run this test:

apply patch

cd contrib/benchmark

run python -u bench.py -delindex </path/to/index/with/deletes>
-nodelindex </path/to/index/without/deletes>

(You can leave off one of -delindex or -nodelindex and it'll skip
those tests).

For each test, bench.py generates a single Java source file that runs
that one query; you can open
contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/FastSearchTask.java
to see it. I'll attach an example. It writes "results.txt", in Jira
table format, which you should be able to copy/paste back here.

The specializer uses pretty much every search speedup I can think of
– the ones from ~~LUCENE-1575~~ (to score or not, to maxScore or not),
the ones suggested in the spinoff ~~LUCENE-1593~~ (pre-fill w/ sentinels,
don't use docID for tie breaking), ~~LUCENE-1536~~ (random access
filters). It bypasses TermDocs and interacts directly with the
IndexInput, and with BitVector for deletions. It directly folds in
the collector, if possible. A filter if used must be random access,
and is assumed to pre-multiply-in the deleted docs.

Current status:

I only handle TermQuery. I'd like to add others over time...

It can collect by score, or single field (with the 3 scoring
options in ~~LUCENE-1575~~). It can't do reverse field sort nor
multi-field sort now.

The auto-gen code (gen.py) is rather hideous. It could use some
serious refactoring, etc.; I think we could get it to the point
where each Query can gen its own specialized code, maybe. It also
needs to be eventually ported to Java.

The script runs old, then new, then checks that the topN results
are identical, and aborts if not. So I'm pretty sure the
specialized code is working correctly, for the cases I'm testing.

The patch includes a few small changes to core, mostly to open up
package protected APIs so I can access stuff

I think this is an interesting effort for several reasons:

It gives us a best-case upper bound performance we can expect from
Lucene's normal search classes (minus algorithmic improvements eg
PFOR) because it makes life as easy as possible on the
compiler/JRE to convert to assembly.

We can spin out optimization ideas from this back into the core
(eg ~~LUCENE-1593~~ already has one example), and prioritize. EG I
think given these results, optimizing for filters that support
random-access API is important. As we fold speedups back into
core, the gains from specialization will naturally decrease.

Eventually (maybe, eg as a future "experimental" module) this can
be used in production as a simple "search wrapper". Ie, for a
given query, the specializer is checked. If the query "matches"
what the specializer can handle, then the specialized code is run;
else we fallback to Lucene core. Likely one would pre-compile the
space of all specializations, or we could compile java-on-the-fly
(eg what a JSP source does when it's changed) but I'm not sure how
costly/portable that is.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1594.patch
10/Apr/09 17:02
51 kB
Michael McCandless
FastSearchTask.java
10/Apr/09 17:03
5 kB
Michael McCandless
LUCENE-1594.patch
26/Apr/09 20:04
1.65 MB
Michael McCandless
LUCENE-1594.patch
07/May/09 18:59
71 kB
Michael McCandless
LUCENE-1594.patch
19/May/09 19:38
131 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Apr/09 16:57

Updated:: 28/Aug/22 11:59

Resolved:: 16/Mar/13 19:18

Agile

View on Board

Use source code specialization to maximize search performance

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment