Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2205

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.5, 4.0-ALPHA
    • core/index
    • None
    • Java5

    • New, Patch Available


      Basically packing those three arrays into a byte array with an int array as an index offset.

      The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster.

      I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well.

      -Dorg.apache.lucene.index.TermInfosReader=default or small

      I have also written a blog about this patch here is the link.



        1. LUCENE-2205.patch
          90 kB
          Robert Muir
        2. LUCENE-2205.patch
          89 kB
          Michael McCandless
        3. lowmemory_w_utf8_encoding.v4.patch
          92 kB
          Aaron McCurry
        4. lowmemory_w_utf8_encoding.patch
          20 kB
          Aaron McCurry
        5. TermInfosReaderIndexSmall.java
          9 kB
          Aaron McCurry
        6. TermInfosReaderIndexDefault.java
          2 kB
          Aaron McCurry
        7. TermInfosReaderIndex.java
          0.5 kB
          Aaron McCurry
        8. TermInfosReader.java
          9 kB
          Aaron McCurry
        9. rawoutput.txt
          8 kB
          Aaron McCurry
        10. RandomAccessTest.java
          4 kB
          Aaron McCurry
        11. patch-final.txt
          18 kB
          Aaron McCurry



            mikemccand Michael McCandless
            amccurry Aaron McCurry
            5 Vote for this issue
            6 Start watching this issue