[LUCENE-2205] Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.5, 4.0-ALPHA
Component/s: core/index
Labels:
None
Environment:

Java5

Lucene Fields:

New, Patch Available

Description

Basically packing those three arrays into a byte array with an int array as an index offset.

The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster.

I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well.

-Dorg.apache.lucene.index.TermInfosReader=default or small

I have also written a blog about this patch here is the link.

http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TermInfosReaderIndexSmall.java
20/Jan/10 02:56
9 kB
Aaron McCurry
TermInfosReaderIndexDefault.java
20/Jan/10 02:55
2 kB
Aaron McCurry
TermInfosReaderIndex.java
20/Jan/10 02:55
0.5 kB
Aaron McCurry
TermInfosReader.java
20/Jan/10 02:55
9 kB
Aaron McCurry
rawoutput.txt
13/Jan/10 08:41
8 kB
Aaron McCurry
RandomAccessTest.java
13/Jan/10 08:41
4 kB
Aaron McCurry
patch-final.txt
13/Jan/10 08:29
18 kB
Aaron McCurry
LUCENE-2205.patch
24/Oct/11 22:09
89 kB
Michael McCandless
LUCENE-2205.patch
27/Oct/11 06:03
90 kB
Robert Muir
lowmemory_w_utf8_encoding.v4.patch
01/Oct/11 11:36
92 kB
Aaron McCurry
lowmemory_w_utf8_encoding.patch
21/Sep/11 11:23
20 kB
Aaron McCurry

Activity

People

Assignee:: Michael McCandless

Reporter:: Aaron McCurry

Votes:: 5 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Jan/10 08:28

Updated:: 24/Oct/22 13:02

Resolved:: 27/Oct/11 20:47