Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2205

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Java5

    • Lucene Fields:
      New, Patch Available

      Description

      Basically packing those three arrays into a byte array with an int array as an index offset.

      The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster.

      I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well.

      -Dorg.apache.lucene.index.TermInfosReader=default or small

      I have also written a blog about this patch here is the link.

      http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

        Attachments

        1. LUCENE-2205.patch
          90 kB
          Robert Muir
        2. LUCENE-2205.patch
          89 kB
          Michael McCandless
        3. lowmemory_w_utf8_encoding.v4.patch
          92 kB
          Aaron McCurry
        4. lowmemory_w_utf8_encoding.patch
          20 kB
          Aaron McCurry
        5. TermInfosReaderIndexSmall.java
          9 kB
          Aaron McCurry
        6. TermInfosReaderIndexDefault.java
          2 kB
          Aaron McCurry
        7. TermInfosReaderIndex.java
          0.5 kB
          Aaron McCurry
        8. TermInfosReader.java
          9 kB
          Aaron McCurry
        9. rawoutput.txt
          8 kB
          Aaron McCurry
        10. RandomAccessTest.java
          4 kB
          Aaron McCurry
        11. patch-final.txt
          18 kB
          Aaron McCurry

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              amccurry Aaron McCurry
            • Votes:
              5 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: