[LUCENE-502] TermScorer caches values unnecessarily - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.9
Fix Version/s: 4.0-ALPHA
Component/s: core/search
Labels:
None

Description

TermScorer aggressively caches the doc and freq of 32 documents at a time for each term scored. When querying for a lot of terms, this causes a lot of garbage to be created that's unnecessary. The SegmentTermDocs from which it retrieves its information doesn't have any optimizations for bulk loading, and it's unnecessary.

In addition, it has a SCORE_CACHE, that's of limited benefit. It's caching the result of a sqrt that should be placed in DefaultSimilarity, and if you're only scoring a few documents that contain those terms, there's no need to precalculate the SQRT, especially on modern VMs.

Enclosed is a patch that replaces TermScorer with a version that does not cache the docs or feqs. In the case of a lot of queries, that saves 196 bytes/term, the unnecessary disk IO, and extra SQRTs which adds up.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-502.patch
13/Nov/08 03:44
6 kB
Mark Miller
TermScorer.patch
01/Mar/06 14:33
6 kB
Steven Tamm

Activity

People

Assignee:: Unassigned

Reporter:: Steven Tamm

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Mar/06 11:32

Updated:: 28/Aug/22 11:25

Resolved:: 15/Jul/11 03:00