Issue Details (XML | Word | Printable)

Key: LUCENE-454
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Yonik Seeley
Reporter: Yonik Seeley
Votes: 6
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

lazily create SegmentMergeInfo.docMap

Created: 13/Oct/05 12:44 PM   Updated: 28/Oct/05 12:52 PM
Return to search
Component/s: None
Affects Version/s: CVS Nightly - Specify date in submission
Fix Version/s: 1.9

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works docMap.txt 2005-10-14 12:25 PM Yonik Seeley 2 kB
Text File Licensed for inclusion in ASF works docMap.txt 2005-10-13 12:49 PM Yonik Seeley 2 kB

Resolution Date: 28/Oct/05 12:52 PM


 Description  « Hide
Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.

SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).

Performance Results:
A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.

Performance Before Patch:
indexing time = 121,656 ms
querying time = 58,812 ms

Performance After Patch:
indexing time = 121,000 ms
querying time = 598 ms

A 100 fold increase in query performance!

All lucene unit tests pass.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Yonik Seeley added a comment - 13/Oct/05 12:49 PM
attaching patch

Yonik Seeley added a comment - 14/Oct/05 12:25 PM
Also deferred creation of SegmentMergeInfo.postings (TermPositions) for another 15% gain.

Same index and query were used to test, but this time 100,000 query iterations.

defer docMap only:
indexing time = 121,734 ms
querying time = 18,266 ms

defer docMap and postings:
indexing time = 120,860 ms
querying time = 15,625 ms