Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2205

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.5, 4.0-ALPHA
    • core/index
    • None
    • Java5

    • New, Patch Available

    Description

      Basically packing those three arrays into a byte array with an int array as an index offset.

      The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster.

      I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well.

      -Dorg.apache.lucene.index.TermInfosReader=default or small

      I have also written a blog about this patch here is the link.

      http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

      Attachments

        1. TermInfosReaderIndexSmall.java
          9 kB
          Aaron McCurry
        2. TermInfosReaderIndexDefault.java
          2 kB
          Aaron McCurry
        3. TermInfosReaderIndex.java
          0.5 kB
          Aaron McCurry
        4. TermInfosReader.java
          9 kB
          Aaron McCurry
        5. rawoutput.txt
          8 kB
          Aaron McCurry
        6. RandomAccessTest.java
          4 kB
          Aaron McCurry
        7. patch-final.txt
          18 kB
          Aaron McCurry
        8. LUCENE-2205.patch
          89 kB
          Michael McCandless
        9. LUCENE-2205.patch
          90 kB
          Robert Muir
        10. lowmemory_w_utf8_encoding.v4.patch
          92 kB
          Aaron McCurry
        11. lowmemory_w_utf8_encoding.patch
          20 kB
          Aaron McCurry

        Activity

          People

            mikemccand Michael McCandless
            amccurry Aaron McCurry
            Votes:
            5 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: