Lucene - Core
  1. Lucene - Core
  2. LUCENE-2205

Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Java5

    • Lucene Fields:
      New, Patch Available

      Description

      Basically packing those three arrays into a byte array with an int array as an index offset.

      The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster.

      I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well.

      -Dorg.apache.lucene.index.TermInfosReader=default or small

      I have also written a blog about this patch here is the link.

      http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

      1. lowmemory_w_utf8_encoding.patch
        20 kB
        Aaron McCurry
      2. lowmemory_w_utf8_encoding.v4.patch
        92 kB
        Aaron McCurry
      3. LUCENE-2205.patch
        90 kB
        Robert Muir
      4. LUCENE-2205.patch
        89 kB
        Michael McCandless
      5. patch-final.txt
        18 kB
        Aaron McCurry
      6. RandomAccessTest.java
        4 kB
        Aaron McCurry
      7. rawoutput.txt
        8 kB
        Aaron McCurry
      8. TermInfosReader.java
        9 kB
        Aaron McCurry
      9. TermInfosReaderIndex.java
        0.5 kB
        Aaron McCurry
      10. TermInfosReaderIndexDefault.java
        2 kB
        Aaron McCurry
      11. TermInfosReaderIndexSmall.java
        9 kB
        Aaron McCurry

        Activity

        Hide
        Uwe Schindler added a comment -

        Bulk close after release of 3.5

        Show
        Uwe Schindler added a comment - Bulk close after release of 3.5
        Hide
        Michael McCandless added a comment -

        Finally resolved; thanks Aaron!

        Show
        Michael McCandless added a comment - Finally resolved; thanks Aaron!
        Hide
        Michael McCandless added a comment -

        New patch looks great, thanks Robert!

        When committing can we also add the packed ints test to 3.x, and also merge this improvement forward to 4.x's preflex codec?

        Yup I'll do that. I'll also merge forward the improvements to PagedBytes.

        I also ran Test2BTerms (it passed); I think this is ready to go in!

        Show
        Michael McCandless added a comment - New patch looks great, thanks Robert! When committing can we also add the packed ints test to 3.x, and also merge this improvement forward to 4.x's preflex codec? Yup I'll do that. I'll also merge forward the improvements to PagedBytes. I also ran Test2BTerms (it passed); I think this is ready to go in!
        Hide
        Robert Muir added a comment -

        +1, nice work.

        just some tiny tweaks: added a missing license header, randomized the test, and when binary-searching bytesref.grow() versus making new ones in each comparison (this seems to help the pk-lookup perf test on my computer).

        When committing can we also add the packed ints test to 3.x, and also merge this improvement forward to 4.x's preflex codec?

        Show
        Robert Muir added a comment - +1, nice work. just some tiny tweaks: added a missing license header, randomized the test, and when binary-searching bytesref.grow() versus making new ones in each comparison (this seems to help the pk-lookup perf test on my computer). When committing can we also add the packed ints test to 3.x, and also merge this improvement forward to 4.x's preflex codec?
        Hide
        Michael McCandless added a comment -

        Well thank you Aaron for doing all the hard parts, and, persisting! Sorry this took so long.

        This will be an enormous improvement for 3.5.0.

        Show
        Michael McCandless added a comment - Well thank you Aaron for doing all the hard parts, and, persisting! Sorry this took so long. This will be an enormous improvement for 3.5.0.
        Hide
        Aaron McCurry added a comment -

        Awesome! Good job!

        Thank you for working on this with me!

        Show
        Aaron McCurry added a comment - Awesome! Good job! Thank you for working on this with me!
        Hide
        Michael McCandless added a comment -

        New patch, iterated from Aaron's last patch.

        I moved the DataInput/Output impls into PagedBytes, so they can directly operate on the byte[] blocks. I also don't write skipOffset unless df >= skipInterval.

        I think this is ready!

        Show
        Michael McCandless added a comment - New patch, iterated from Aaron's last patch. I moved the DataInput/Output impls into PagedBytes, so they can directly operate on the byte[] blocks. I also don't write skipOffset unless df >= skipInterval. I think this is ready!
        Hide
        Aaron McCurry added a comment -

        Wow GrowableWriter is very cool! Thanks for explaining!

        Show
        Aaron McCurry added a comment - Wow GrowableWriter is very cool! Thanks for explaining!
        Hide
        Michael McCandless added a comment -

        But I may not understand how the GrowableWriter will help. I understand that it allows me to append more values as I go, but when I start writing them I have no idea what bit size to choose for the packing. Can you explain?

        It actually grows in "both" dimensions – it tracks the max value so far and internally will "upgrade" to a bigger bits-per-value as needed. So eg you could start with small bitsPerValue (maybe 4 or something) and then let it grow itself.

        Also if you would like to finish this patch that would fine with me. Let me know if you want me to continue or if you are going to work on it. Thanks!

        OK thanks Aaron... I'll take a crack at the next iteration.

        Show
        Michael McCandless added a comment - But I may not understand how the GrowableWriter will help. I understand that it allows me to append more values as I go, but when I start writing them I have no idea what bit size to choose for the packing. Can you explain? It actually grows in "both" dimensions – it tracks the max value so far and internally will "upgrade" to a bigger bits-per-value as needed. So eg you could start with small bitsPerValue (maybe 4 or something) and then let it grow itself. Also if you would like to finish this patch that would fine with me. Let me know if you want me to continue or if you are going to work on it. Thanks! OK thanks Aaron... I'll take a crack at the next iteration.
        Hide
        Aaron McCurry added a comment -

        Sorry that I haven't gotten back to you yet, life has a way of slowing down development.

        As for you comments from a couple of weeks ago. I agree with the PageBytes comments, I knew there was going to be more work there. But I may not understand how the GrowableWriter will help. I understand that it allows me to append more values as I go, but when I start writing them I have no idea what bit size to choose for the packing. Can you explain?

        Also if you would like to finish this patch that would fine with me. Let me know if you want me to continue or if you are going to work on it. Thanks!

        Show
        Aaron McCurry added a comment - Sorry that I haven't gotten back to you yet, life has a way of slowing down development. As for you comments from a couple of weeks ago. I agree with the PageBytes comments, I knew there was going to be more work there. But I may not understand how the GrowableWriter will help. I understand that it allows me to append more values as I go, but when I start writing them I have no idea what bit size to choose for the packing. Can you explain? Also if you would like to finish this patch that would fine with me. Let me know if you want me to continue or if you are going to work on it. Thanks!
        Hide
        Michael McCandless added a comment -

        Hi Aaron, any progress here? If you want I can iterate from your last patch and do the remaining changes... I think we are close here, and this will be an awesome improvement.

        Show
        Michael McCandless added a comment - Hi Aaron, any progress here? If you want I can iterate from your last patch and do the remaining changes... I think we are close here, and this will be an awesome improvement.
        Hide
        Michael McCandless added a comment -

        Looking great Aaron!

        It's spooky that PagedBytesDataInput calls fillSlice for every
        .readByte – can't we have it hold the current block and then only
        switch to a new block in .readByte() if it's at the end of current
        block?

        Same for PagedBytesDataOutput?

        We can have these DataInput/Output impls be private to PagedBytes (so
        they can access the pages directly)?

        You should be able to use PackedInts.GrowableWriter, to append the
        ints directly, instead of first writing to the indexToTermsArray and
        then separately to the packed ints? Saves the added transient RAM
        usage and 2nd pass.

        I don't think you need to write the indexToTerms packed ints into a
        PagedBytesDataOutput (if you use GrowableWriter it just uses a byte[]
        under the hood, and resizes as needed)? This array will be small
        enough, since it's the packed int byte address of every 128th term, I
        think (but dataOutput does need to be paged bytes).

        Show
        Michael McCandless added a comment - Looking great Aaron! It's spooky that PagedBytesDataInput calls fillSlice for every .readByte – can't we have it hold the current block and then only switch to a new block in .readByte() if it's at the end of current block? Same for PagedBytesDataOutput? We can have these DataInput/Output impls be private to PagedBytes (so they can access the pages directly)? You should be able to use PackedInts.GrowableWriter, to append the ints directly, instead of first writing to the indexToTermsArray and then separately to the packed ints? Saves the added transient RAM usage and 2nd pass. I don't think you need to write the indexToTerms packed ints into a PagedBytesDataOutput (if you use GrowableWriter it just uses a byte[] under the hood, and resizes as needed)? This array will be small enough, since it's the packed int byte address of every 128th term, I think (but dataOutput does need to be paged bytes).
        Hide
        Aaron McCurry added a comment -

        This patch includes fixes for all the previous comments, however in my small tests it seems to produce more memory overhead then I would have expected. I will be on vacation for the next week, so I wanted to attach the current state of the patch incase someone else sees my mistake causing the extra memory to be used. Also I have not done any performance testing with this patch.

        Show
        Aaron McCurry added a comment - This patch includes fixes for all the previous comments, however in my small tests it seems to produce more memory overhead then I would have expected. I will be on vacation for the next week, so I wanted to attach the current state of the patch incase someone else sees my mistake causing the extra memory to be used. Also I have not done any performance testing with this patch.
        Hide
        Michael McCandless added a comment -

        I have done the PagedBytes back port already. It was a simple copy of the class (assuming that's what you want me to do).

        Excellent, I'm glad it was straightforward. We also need a DataInput impl that reads from the PagedBytes... I can help on that if you want.

        As for the oal.util.packed package for the packed ints, I think they should be modified to work against the DataInput and DataOutput instead of the IndexInput and IndexOutput.

        I agree – I committed this to trunk.

        Show
        Michael McCandless added a comment - I have done the PagedBytes back port already. It was a simple copy of the class (assuming that's what you want me to do). Excellent, I'm glad it was straightforward. We also need a DataInput impl that reads from the PagedBytes... I can help on that if you want. As for the oal.util.packed package for the packed ints, I think they should be modified to work against the DataInput and DataOutput instead of the IndexInput and IndexOutput. I agree – I committed this to trunk.
        Hide
        Aaron McCurry added a comment -

        I have done the PagedBytes back port already. It was a simple copy of the class (assuming that's what you want me to do). As for the oal.util.packed package for the packed ints, I think they should be modified to work against the DataInput and DataOutput instead of the IndexInput and IndexOutput. If so, should a separate issue in JIRA be opened to modify them in the trunk and then I will back port that version? What do you think?

        Also I will probably work on this patch again in the next few days and get a final patch (hopefully) at some point this week.

        Show
        Aaron McCurry added a comment - I have done the PagedBytes back port already. It was a simple copy of the class (assuming that's what you want me to do). As for the oal.util.packed package for the packed ints, I think they should be modified to work against the DataInput and DataOutput instead of the IndexInput and IndexOutput. If so, should a separate issue in JIRA be opened to modify them in the trunk and then I will back port that version? What do you think? Also I will probably work on this patch again in the next few days and get a final patch (hopefully) at some point this week.
        Hide
        Michael McCandless added a comment -

        Aaron, if it would help, I can backport PagedBytes to 3.x?

        Show
        Michael McCandless added a comment - Aaron, if it would help, I can backport PagedBytes to 3.x?
        Hide
        Michael McCandless added a comment -

        I ported luceneutil's PKLookupTest to 3.x (see
        http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/perf/PKLookupPerfTest3X.java),
        and ran a test w/ 100M docs, spread across 25 segs, doing 100K PK
        lookups. I temporarily disabled the terms lookup cache.

        3.x:

        Reader=ReadOnlyDirectoryReader(segments_1 _d6(3.5):C18018000 _3x(3.5):C18018000 _70(3.5):C18018000 _a3(3.5):C18018000 _bh(3.5):C1801800 _g9(3.5):C18018000 _fp(3.5):C1801800 _gk(3.5):C1801800 _g1(3.5):C180180 _g2(3.5):C180180 _gu(3.5):C1801800 _g7(3.5):C180180 _gd(3.5):C180180 _ge(3.5):C180180 _gj(3.5):C180180 _gl(3.5):C180180 _gp(3.5):C180180 _gv(3.5):C180180 _gw(3.5):C180180 _gx(3.5):C180180 _gy(3.5):C180180 _gz(3.5):C180180 _h0(3.5):C180180 _h1(3.5):C180180 _h2(3.5):C100)
        Cycle: warm
          Lookup...
          WARM: 10428 msec for 100000 lookups (104.28 us per lookup)
        Cycle: test
          Lookup...
          10309 msec for 100000 lookups (103.09 us per lookup)
        Cycle: test
          Lookup...
          10333 msec for 100000 lookups (103.33 us per lookup)
        Cycle: test
          Lookup...
          10333 msec for 100000 lookups (103.33 us per lookup)
        Cycle: test
          Lookup...
          10506 msec for 100000 lookups (105.06 us per lookup)
        Cycle: test
          Lookup...
          10499 msec for 100000 lookups (104.99 us per lookup)
        Cycle: test
          Lookup...
          10297 msec for 100000 lookups (102.97 us per lookup)
        Cycle: test
          Lookup...
          10345 msec for 100000 lookups (103.45 us per lookup)
        Cycle: test
          Lookup...
          10396 msec for 100000 lookups (103.96 us per lookup)
        Cycle: test
          Lookup...
          10302 msec for 100000 lookups (103.02 us per lookup)
        

        Patch:

        Reader=ReadOnlyDirectoryReader(segments_1 _d6(3.5):C18018000 _3x(3.5):C18018000 _70(3.5):C18018000 _a3(3.5):C18018000 _bh(3.5):C1801800 _g9(3.5):C18018000 _fp(3.5):C1801800 _gk(3.5):C1801800 _g1(3.5):C180180 _g2(3.5):C180180 _gu(3.5):C1801800 _g7(3.5):C180180 _gd(3.5):C180180 _ge(3.5):C180180 _gj(3.5):C180180 _gl(3.5):C180180 _gp(3.5):C180180 _gv(3.5):C180180 _gw(3.5):C180180 _gx(3.5):C180180 _gy(3.5):C180180 _gz(3.5):C180180 _h0(3.5):C180180 _h1(3.5):C180180 _h2(3.5):C100)
        Cycle: warm
          Lookup...
          WARM: 11164 msec for 100000 lookups (111.64 us per lookup)
        Cycle: test
          Lookup...
          10838 msec for 100000 lookups (108.38 us per lookup)
        Cycle: test
          Lookup...
          10882 msec for 100000 lookups (108.82 us per lookup)
        Cycle: test
          Lookup...
          10873 msec for 100000 lookups (108.73 us per lookup)
        Cycle: test
          Lookup...
          10871 msec for 100000 lookups (108.71 us per lookup)
        Cycle: test
          Lookup...
          10870 msec for 100000 lookups (108.7 us per lookup)
        Cycle: test
          Lookup...
          10896 msec for 100000 lookups (108.96 us per lookup)
        Cycle: test
          Lookup...
          10840 msec for 100000 lookups (108.4 us per lookup)
        Cycle: test
          Lookup...
          10860 msec for 100000 lookups (108.6 us per lookup)
        Cycle: test
          Lookup...
          10847 msec for 100000 lookups (108.47 us per lookup)
        

        So net/net patch is a bit (~5%) slower, as expected since PKLookup is
        the worst case here, but I think the enormous gains in RAM reduction /
        startup time / GC load make this tradeoff acceptable.

        Show
        Michael McCandless added a comment - I ported luceneutil's PKLookupTest to 3.x (see http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/perf/PKLookupPerfTest3X.java ), and ran a test w/ 100M docs, spread across 25 segs, doing 100K PK lookups. I temporarily disabled the terms lookup cache. 3.x: Reader=ReadOnlyDirectoryReader(segments_1 _d6(3.5):C18018000 _3x(3.5):C18018000 _70(3.5):C18018000 _a3(3.5):C18018000 _bh(3.5):C1801800 _g9(3.5):C18018000 _fp(3.5):C1801800 _gk(3.5):C1801800 _g1(3.5):C180180 _g2(3.5):C180180 _gu(3.5):C1801800 _g7(3.5):C180180 _gd(3.5):C180180 _ge(3.5):C180180 _gj(3.5):C180180 _gl(3.5):C180180 _gp(3.5):C180180 _gv(3.5):C180180 _gw(3.5):C180180 _gx(3.5):C180180 _gy(3.5):C180180 _gz(3.5):C180180 _h0(3.5):C180180 _h1(3.5):C180180 _h2(3.5):C100) Cycle: warm Lookup... WARM: 10428 msec for 100000 lookups (104.28 us per lookup) Cycle: test Lookup... 10309 msec for 100000 lookups (103.09 us per lookup) Cycle: test Lookup... 10333 msec for 100000 lookups (103.33 us per lookup) Cycle: test Lookup... 10333 msec for 100000 lookups (103.33 us per lookup) Cycle: test Lookup... 10506 msec for 100000 lookups (105.06 us per lookup) Cycle: test Lookup... 10499 msec for 100000 lookups (104.99 us per lookup) Cycle: test Lookup... 10297 msec for 100000 lookups (102.97 us per lookup) Cycle: test Lookup... 10345 msec for 100000 lookups (103.45 us per lookup) Cycle: test Lookup... 10396 msec for 100000 lookups (103.96 us per lookup) Cycle: test Lookup... 10302 msec for 100000 lookups (103.02 us per lookup) Patch: Reader=ReadOnlyDirectoryReader(segments_1 _d6(3.5):C18018000 _3x(3.5):C18018000 _70(3.5):C18018000 _a3(3.5):C18018000 _bh(3.5):C1801800 _g9(3.5):C18018000 _fp(3.5):C1801800 _gk(3.5):C1801800 _g1(3.5):C180180 _g2(3.5):C180180 _gu(3.5):C1801800 _g7(3.5):C180180 _gd(3.5):C180180 _ge(3.5):C180180 _gj(3.5):C180180 _gl(3.5):C180180 _gp(3.5):C180180 _gv(3.5):C180180 _gw(3.5):C180180 _gx(3.5):C180180 _gy(3.5):C180180 _gz(3.5):C180180 _h0(3.5):C180180 _h1(3.5):C180180 _h2(3.5):C100) Cycle: warm Lookup... WARM: 11164 msec for 100000 lookups (111.64 us per lookup) Cycle: test Lookup... 10838 msec for 100000 lookups (108.38 us per lookup) Cycle: test Lookup... 10882 msec for 100000 lookups (108.82 us per lookup) Cycle: test Lookup... 10873 msec for 100000 lookups (108.73 us per lookup) Cycle: test Lookup... 10871 msec for 100000 lookups (108.71 us per lookup) Cycle: test Lookup... 10870 msec for 100000 lookups (108.7 us per lookup) Cycle: test Lookup... 10896 msec for 100000 lookups (108.96 us per lookup) Cycle: test Lookup... 10840 msec for 100000 lookups (108.4 us per lookup) Cycle: test Lookup... 10860 msec for 100000 lookups (108.6 us per lookup) Cycle: test Lookup... 10847 msec for 100000 lookups (108.47 us per lookup) So net/net patch is a bit (~5%) slower, as expected since PKLookup is the worst case here, but I think the enormous gains in RAM reduction / startup time / GC load make this tradeoff acceptable.
        Hide
        Michael McCandless added a comment -

        However due to the growing array the likely hood of it landing on 2.1 exactly is probably not likely. So it would probably error out sometime before that.

        Actually ArrayUtil.grow is careful about this limit: on that final
        grow() it'll go right up to Java's max allowed array size.

        I'm also building up a 2B terms index (using Test2BTerms), and then I'll compare patch/3.x on that index.

        OK this finished – the test passed with the patch (good news!), and
        3.x (phew!).

        With 3.x, IR.open takes 43.69 seconds and uses 2955 MB of heap.

        With the patch, IR.open takes 9.94 seconds (4.4X faster) and uses 505
        MB of heap (5.9X less): AWESOME!

        The test then does a lookup of a random set of terms. 3.x does this
        in 51.2 sec; patch does it in 48.5 sec, good! (Same set of terms).

        I can back port PagedBytes instead if you think it's really needed.

        I think we should cutover to PagedBytes. Today the number of terms we
        can support is 2.1B times index interval (default 128), so ~274.9 B
        terms.

        But with the current patch, we can roughly estimate bytes per indexed
        term:

        • 1 byte for fieldCounter
        • 15 bytes for term UTF8 bytes (non-English content)
        • 1 byte for docFreq (vast majority of terms are < 128 df)
        • 1 byte for skipOffset (vast majority of terms have no skip).
        • 5 bytes for freqOffset
        • 5 bytes for proxOffset
        • 5 bytes for indexOffset
        • 4 bytes for indexToTerms entry

        So total ~37 bytes per indexed term, which means ~58.0 M indexed terms
        can fit in the 2.1B byte[] limit, or 7.4 B total terms at the default
        128 index interval. This makes me a little nervous... we've already
        seen have apps that are well over 2.1 B terms.

        Even before the 2.1B limit, it makes me nervous relying on the JRE to
        allocate such a large contiguous chunk of RAM.

        A couple other random things I noticed:

        • When we estimate the initial size of the byte[] (based on .tii
          file size), I think we should divide by indexDivisor?
        • We should conditionally write the skipOffset, only when docFreq is
          >= skipInterval. Since most terms won't have skip data we can
          save 1 byte for them...
        Show
        Michael McCandless added a comment - However due to the growing array the likely hood of it landing on 2.1 exactly is probably not likely. So it would probably error out sometime before that. Actually ArrayUtil.grow is careful about this limit: on that final grow() it'll go right up to Java's max allowed array size. I'm also building up a 2B terms index (using Test2BTerms), and then I'll compare patch/3.x on that index. OK this finished – the test passed with the patch (good news!), and 3.x (phew!). With 3.x, IR.open takes 43.69 seconds and uses 2955 MB of heap. With the patch, IR.open takes 9.94 seconds (4.4X faster) and uses 505 MB of heap (5.9X less): AWESOME! The test then does a lookup of a random set of terms. 3.x does this in 51.2 sec; patch does it in 48.5 sec, good! (Same set of terms). I can back port PagedBytes instead if you think it's really needed. I think we should cutover to PagedBytes. Today the number of terms we can support is 2.1B times index interval (default 128), so ~274.9 B terms. But with the current patch, we can roughly estimate bytes per indexed term: 1 byte for fieldCounter 15 bytes for term UTF8 bytes (non-English content) 1 byte for docFreq (vast majority of terms are < 128 df) 1 byte for skipOffset (vast majority of terms have no skip). 5 bytes for freqOffset 5 bytes for proxOffset 5 bytes for indexOffset 4 bytes for indexToTerms entry So total ~37 bytes per indexed term, which means ~58.0 M indexed terms can fit in the 2.1B byte[] limit, or 7.4 B total terms at the default 128 index interval. This makes me a little nervous... we've already seen have apps that are well over 2.1 B terms. Even before the 2.1B limit, it makes me nervous relying on the JRE to allocate such a large contiguous chunk of RAM. A couple other random things I noticed: When we estimate the initial size of the byte[] (based on .tii file size), I think we should divide by indexDivisor? We should conditionally write the skipOffset, only when docFreq is >= skipInterval. Since most terms won't have skip data we can save 1 byte for them...
        Hide
        Aaron McCurry added a comment -

        Ok retested things with the same test as before except this time I used a standard analyzer.

        During my functional testing I created a test index with small but very diverse terms. Roughly 50 terms per document with 5 million documents. So there are approximately 250 million terms in this index.

        The current 3x branch produces:
        5000000 documents at a heap size of 259581864.

        The patched version produces:
        5000000 documents at a heap size of 48991336.

        The random access performance of this index goes to the patch. Running 1000 passes of a collection of randomly sampled queries (queries changes each time) produces the following:

        The current 3x branch produces:
        113.485454 avg response time in ms per collection

        The patched version produces:
        118.926365 avg response time in ms per collection

        Each collection is 200 queries. Also it's seems like the jvm is a slower to hotspot compile the patched version. In a server implementation this shouldn't be a big concern.

        Show
        Aaron McCurry added a comment - Ok retested things with the same test as before except this time I used a standard analyzer. During my functional testing I created a test index with small but very diverse terms. Roughly 50 terms per document with 5 million documents. So there are approximately 250 million terms in this index. The current 3x branch produces: 5000000 documents at a heap size of 259581864. The patched version produces: 5000000 documents at a heap size of 48991336. The random access performance of this index goes to the patch. Running 1000 passes of a collection of randomly sampled queries (queries changes each time) produces the following: The current 3x branch produces: 113.485454 avg response time in ms per collection The patched version produces: 118.926365 avg response time in ms per collection Each collection is 200 queries. Also it's seems like the jvm is a slower to hotspot compile the patched version. In a server implementation this shouldn't be a big concern.
        Hide
        Aaron McCurry added a comment -

        Memory fragmentation aside, if an index segment contained 2.1 B terms and a default term index interval of 128, each term would need to about 16M for the byte[] to run out of space. However due to the growing array the likely hood of it landing on 2.1 exactly is probably not likely. So it would probably error out sometime before that.

        I can back port PagedBytes instead if you think it's really needed.

        I'm working on the PackedInts suggestion now, all the others have already been corrected.

        Show
        Aaron McCurry added a comment - Memory fragmentation aside, if an index segment contained 2.1 B terms and a default term index interval of 128, each term would need to about 16M for the byte[] to run out of space. However due to the growing array the likely hood of it landing on 2.1 exactly is probably not likely. So it would probably error out sometime before that. I can back port PagedBytes instead if you think it's really needed. I'm working on the PackedInts suggestion now, all the others have already been corrected.
        Hide
        Michael McCandless added a comment -

        I was using keyword analyzer instead of whitespace or standard

        Aha! Good catch

        I'm also building up a 2B terms index (using Test2BTerms), and then I'll compare patch/3.x on that index.

        Show
        Michael McCandless added a comment - I was using keyword analyzer instead of whitespace or standard Aha! Good catch I'm also building up a 2B terms index (using Test2BTerms), and then I'll compare patch/3.x on that index.
        Hide
        Michael McCandless added a comment -

        Patch looks great Aaron! Very much simplified... some comments:

        • Instead of separate build method, could we have
          TermInfosReaderIndex's ctor take all the args? Then we can make
          its private fields final?
        • I think the index and indexLength can be final, in
          TermInfosReader?
        • Can you put the GrowableByteArrayDataOutput as a separate source
          file in oal.store? Seems useful!
        • Hmm should indexToTermsArray be a long[]...? I wonder how large
          your index would have to be to overflow 2.1GB of the byte[]
          format...
        • We could further reduce the RAM usage by using packed ints
          (oal.util.packed) for the indexToTerms array; this way each
          indexed term would only use as many bits are actually required to
          address the byte[] (and, this would solve the int[]/long[] problem
          since packed ints are logically a long[]).
        • I think we should just always trim? (Ie we don't need the
          private boolean trim)
        • Could you add comment "Just for testing" to
          TermInfosReaderIndex.getTerm?
        • For the compareTo methods, can you add to the jdocs that this
          "compares term to index term", ie it returns negative N when term
          is less than index term?
        • Hmm... I wonder if memory fragmentation will cause problems for
          the allocating/growing the single byte[]. Also, a single byte[]
          can "only" address 2.1B bytes (the same overflow problem as
          above). Maybe we should port back PagedBytes (from trunk
          oal.util) and use that instead? If we did that, then we could
          create a simple DataInput impl that reads from that.
        • Could you please remove the @author tags? Thanks. It's Apache's
          policy (or at least discouraged) to not commit author tags...
        Show
        Michael McCandless added a comment - Patch looks great Aaron! Very much simplified... some comments: Instead of separate build method, could we have TermInfosReaderIndex's ctor take all the args? Then we can make its private fields final? I think the index and indexLength can be final, in TermInfosReader? Can you put the GrowableByteArrayDataOutput as a separate source file in oal.store? Seems useful! Hmm should indexToTermsArray be a long[]...? I wonder how large your index would have to be to overflow 2.1GB of the byte[] format... We could further reduce the RAM usage by using packed ints (oal.util.packed) for the indexToTerms array; this way each indexed term would only use as many bits are actually required to address the byte[] (and, this would solve the int[]/long[] problem since packed ints are logically a long[]). I think we should just always trim? (Ie we don't need the private boolean trim ) Could you add comment "Just for testing" to TermInfosReaderIndex.getTerm? For the compareTo methods, can you add to the jdocs that this "compares term to index term", ie it returns negative N when term is less than index term? Hmm... I wonder if memory fragmentation will cause problems for the allocating/growing the single byte[]. Also, a single byte[] can "only" address 2.1B bytes (the same overflow problem as above). Maybe we should port back PagedBytes (from trunk oal.util) and use that instead? If we did that, then we could create a simple DataInput impl that reads from that. Could you please remove the @author tags? Thanks. It's Apache's policy (or at least discouraged) to not commit author tags...
        Hide
        Aaron McCurry added a comment - - edited

        I found a major bug in my test. I was using keyword analyzer instead of whitespace or standard, thus it was turning everyone of my sentences that contained 50 randomly generated words into 1 huge token. This helps to explain why the heap space results are not that stellar, because the fewer terms there are (as well as the larger they are), the less the patch helps reduce space. I'm retesting now.

        Show
        Aaron McCurry added a comment - - edited I found a major bug in my test. I was using keyword analyzer instead of whitespace or standard, thus it was turning everyone of my sentences that contained 50 randomly generated words into 1 huge token. This helps to explain why the heap space results are not that stellar, because the fewer terms there are (as well as the larger they are), the less the patch helps reduce space. I'm retesting now.
        Hide
        Aaron McCurry added a comment -

        I would agree on the heap size, I'm will do more analysis on that tonight. As far the speed, it took a bit of time to get the performance basically the same. I had to change a few methods inside TermInfosReader to reuse resources.

        The random access test sampled 100,000 terms from the index and stored it in a file. Then at when I run the test it pulls all of the terms into memory and random selects terms to use in TermQueries. Then the test times the search in nanotime and averages it. I will attach my test programs tonight if you want. While running a MMAPDirectory on a small ~1,000,000 documents the performance is basically the same between the patch and no patch, if there is a difference the current implementation (no patch) is slightly faster, as you would think.

        Show
        Aaron McCurry added a comment - I would agree on the heap size, I'm will do more analysis on that tonight. As far the speed, it took a bit of time to get the performance basically the same. I had to change a few methods inside TermInfosReader to reuse resources. The random access test sampled 100,000 terms from the index and stored it in a file. Then at when I run the test it pulls all of the terms into memory and random selects terms to use in TermQueries. Then the test times the search in nanotime and averages it. I will attach my test programs tonight if you want. While running a MMAPDirectory on a small ~1,000,000 documents the performance is basically the same between the patch and no patch, if there is a difference the current implementation (no patch) is slightly faster, as you would think.
        Hide
        Michael McCandless added a comment -

        Thanks Aaron, I'll have a look at the patch.

        It's interesting that the patch was faster in your testing (I thought it should be somewhat slower, while using much less RAM) – when you picked randomly sampled queries, was it deterministically random for patch & 3.x? Ie, patch and 3.x ran the same randomly picked set of queries?

        Also, I would expect the RAM reduction to be even more than you saw in the heap sizes above, since you entirely avoid object overhead (headers, pointers), ints become vInts, etc.

        Show
        Michael McCandless added a comment - Thanks Aaron, I'll have a look at the patch. It's interesting that the patch was faster in your testing (I thought it should be somewhat slower, while using much less RAM) – when you picked randomly sampled queries, was it deterministically random for patch & 3.x? Ie, patch and 3.x ran the same randomly picked set of queries? Also, I would expect the RAM reduction to be even more than you saw in the heap sizes above, since you entirely avoid object overhead (headers, pointers), ints become vInts, etc.
        Hide
        Aaron McCurry added a comment -

        I have reimplemented the patch using the UTF8SortedAsUTF16Comparator as well as ByteArrayDataInput. The patch also contains a unit test and I have run all the current tests of the core plus the contribs and everything passes. As a plus the code has gotten much simpler.

        During my functional testing I created a test index with small but very diverse terms. Roughly 50 terms per document with 50 million documents. So there are approximately 2.5 billion terms in this index.

        The current 3x branch produces:
        50000000 documents at a heap size of 598902872.

        The patched version produces:
        50000000 documents at a heap size of 282526224.

        The random access performance of this index goes to the patch. Running 200 passes of a collection of randomly sampled queries (queries changes each time) produces the following:

        The current 3x branch produces:
        4186.0225 avg response time in ms

        The patched version produces:
        2930.1371 avg response time in ms

        NOTE: The hard drive I was using is a very slow drive. While using smaller indexes the patch and the current branch are very close to the same performance. Depending on the pass the either one was faster.

        Show
        Aaron McCurry added a comment - I have reimplemented the patch using the UTF8SortedAsUTF16Comparator as well as ByteArrayDataInput. The patch also contains a unit test and I have run all the current tests of the core plus the contribs and everything passes. As a plus the code has gotten much simpler. During my functional testing I created a test index with small but very diverse terms. Roughly 50 terms per document with 50 million documents. So there are approximately 2.5 billion terms in this index. The current 3x branch produces: 50000000 documents at a heap size of 598902872. The patched version produces: 50000000 documents at a heap size of 282526224. The random access performance of this index goes to the patch. Running 200 passes of a collection of randomly sampled queries (queries changes each time) produces the following: The current 3x branch produces: 4186.0225 avg response time in ms The patched version produces: 2930.1371 avg response time in ms NOTE: The hard drive I was using is a very slow drive. While using smaller indexes the patch and the current branch are very close to the same performance. Depending on the pass the either one was faster.
        Hide
        Michael McCandless added a comment -

        Thanks Aaron, and sorry this process has taken so long! This is a great improvement.

        Show
        Michael McCandless added a comment - Thanks Aaron, and sorry this process has taken so long! This is a great improvement.
        Hide
        Aaron McCurry added a comment -

        I agree with the back porting of ByteArrayDataInput and the UTF8 storing of string data. I will work getting the changes that have been discussed here reworked and submit a new (single patch).

        Show
        Aaron McCurry added a comment - I agree with the back porting of ByteArrayDataInput and the UTF8 storing of string data. I will work getting the changes that have been discussed here reworked and submit a new (single patch).
        Hide
        Michael McCandless added a comment -

        OK, thinking more here... the fact that this won't change the index
        format, and only replaces the low-level representation & methods for
        how indexed terms are held in RAM and accessed, means that the risk
        here is actually quite low.

        And the gains are tremendous (much lower RAM usage; must less GC load;
        faster IR init time). Users shouldn't have to wait for 4.0 to get
        these improvements.

        Seek cost will go up, but this likely doesn't often matter (3.x
        doesn't have any super-seek-intensive queries). Maybe the primary-key
        lookup case is the worst-case, so we should measure that?

        I think we can port back some help from trunk to support this, eg
        ByteArrayDataInput (to get readVInt/readVLong/etc. on a byte[]).

        I don't think we need to make this switchable with a system prop;
        let's just do a hard cutover to the new impl?

        Instead of writing the term data as vLong, can we just write the UTF8
        bytes? On seek we can convert incoming term's text to UTF8, and then
        use trunk's UTF8SortedAsUTF16Comparator to do the compares in the
        binary search (so we keep 3.x's UTF16 term sort order).

        We should also remember on commit merge this to 4.0's preflex codec...

        Aaron, on future iterations, could you use "svn diff" to produce a
        single patch file (instead of separate files as attachments)? This
        way I (and others) can easily apply it to a local checkout for testing...

        Show
        Michael McCandless added a comment - OK, thinking more here... the fact that this won't change the index format, and only replaces the low-level representation & methods for how indexed terms are held in RAM and accessed, means that the risk here is actually quite low. And the gains are tremendous (much lower RAM usage; must less GC load; faster IR init time). Users shouldn't have to wait for 4.0 to get these improvements. Seek cost will go up, but this likely doesn't often matter (3.x doesn't have any super-seek-intensive queries). Maybe the primary-key lookup case is the worst-case, so we should measure that? I think we can port back some help from trunk to support this, eg ByteArrayDataInput (to get readVInt/readVLong/etc. on a byte[]). I don't think we need to make this switchable with a system prop; let's just do a hard cutover to the new impl? Instead of writing the term data as vLong, can we just write the UTF8 bytes? On seek we can convert incoming term's text to UTF8, and then use trunk's UTF8SortedAsUTF16Comparator to do the compares in the binary search (so we keep 3.x's UTF16 term sort order). We should also remember on commit merge this to 4.0's preflex codec... Aaron, on future iterations, could you use "svn diff" to produce a single patch file (instead of separate files as attachments)? This way I (and others) can easily apply it to a local checkout for testing...
        Hide
        Doug Cutting added a comment -

        Michael, thanks for responding. Great to hear that this is likely greatly improved in 4.0!

        This patch is a minor refactoring to add an extension point and a new implementation of that extension point that's not used by default. There appear to be both refactorings (LUCENE-3201, LUCENE-3360) and new features scheduled for 3.5. Do you feel this refactoring is riskier than those?

        It should certainly also be demonstrated that this provides significant advantages over termInfosIndexDivisor before it is committed.

        Show
        Doug Cutting added a comment - Michael, thanks for responding. Great to hear that this is likely greatly improved in 4.0! This patch is a minor refactoring to add an extension point and a new implementation of that extension point that's not used by default. There appear to be both refactorings ( LUCENE-3201 , LUCENE-3360 ) and new features scheduled for 3.5. Do you feel this refactoring is riskier than those? It should certainly also be demonstrated that this provides significant advantages over termInfosIndexDivisor before it is committed.
        Hide
        Michael McCandless added a comment -

        Realistically I don't think we should commit such a large change to how the terms dict/index works, on the 3.x (stable) branch.

        In trunk, with flex indexing, we've substantially improved how indexed terms are maintained; we now use an FST (a single byte[]) to hold the prefix trie for all terms. This is very compact: I recently checked on a large Wikipedia index segment and it was ~0.35 bytes of RAM in the terms index, per term in the terms dict, with the default allowed25 - 48 terms per block.

        Show
        Michael McCandless added a comment - Realistically I don't think we should commit such a large change to how the terms dict/index works, on the 3.x (stable) branch. In trunk, with flex indexing, we've substantially improved how indexed terms are maintained; we now use an FST (a single byte[]) to hold the prefix trie for all terms. This is very compact: I recently checked on a large Wikipedia index segment and it was ~0.35 bytes of RAM in the terms index, per term in the terms dict, with the default allowed25 - 48 terms per block.
        Hide
        Aaron McCurry added a comment -

        I will update my patch taking your comments into account and re-submit.

        I have tried the termInfosIndexDivisor and it does help with memory consumption but it typically costs access time (if I remember since I last tried changing the values to tune the index). Since I started running with the patch above I haven't had any memory issues relating to index size, so I can't really comment it's effect.

        Show
        Aaron McCurry added a comment - I will update my patch taking your comments into account and re-submit. I have tried the termInfosIndexDivisor and it does help with memory consumption but it typically costs access time (if I remember since I last tried changing the values to tune the index). Since I started running with the patch above I haven't had any memory issues relating to index size, so I can't really comment it's effect.
        Hide
        Doug Cutting added a comment -

        Also have you tried specifying termInfosIndexDivisor? I added that feature many years ago to address the memory footprint of the terms index.

        http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache.lucene.store.Directory, org.apache.lucene.index.IndexDeletionPolicy, boolean, int)

        If this is 2 then the memory use is halved, but the compute cost of looking up each search term is doubled. It would be interesting to compare the performance of the two approaches, since the approach of this patch probably increases lookup cost somewhat too.

        Show
        Doug Cutting added a comment - Also have you tried specifying termInfosIndexDivisor? I added that feature many years ago to address the memory footprint of the terms index. http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache.lucene.store.Directory , org.apache.lucene.index.IndexDeletionPolicy, boolean, int) If this is 2 then the memory use is halved, but the compute cost of looking up each search term is doubled. It would be interesting to compare the performance of the two approaches, since the approach of this patch probably increases lookup cost somewhat too.
        Hide
        Doug Cutting added a comment -

        A few comments on the patch:

        • It'd probably be better not to make TermInfosReaderIndex and its subclasses public, to reduce the APIs that must be supported long-term.
        • Could you use BufferedIndexInput directly instead of re-implementing readVInt, readVLong, etc?
        • The code uses tabs for indentation. Lucene's standard is 2-spaces per level, no tabs. http://wiki.apache.org/lucene-java/HowToContribute
        • It would be good to add some tests, perhaps running some existing set of test searches with a reader configured to use the new TermInfosReaderIndex implementation.

        Probably the "Fix-version" of this patch should be 3.5, since it's not fixing a regression.

        Show
        Doug Cutting added a comment - A few comments on the patch: It'd probably be better not to make TermInfosReaderIndex and its subclasses public, to reduce the APIs that must be supported long-term. Could you use BufferedIndexInput directly instead of re-implementing readVInt, readVLong, etc? The code uses tabs for indentation. Lucene's standard is 2-spaces per level, no tabs. http://wiki.apache.org/lucene-java/HowToContribute It would be good to add some tests, perhaps running some existing set of test searches with a reader configured to use the new TermInfosReaderIndex implementation. Probably the "Fix-version" of this patch should be 3.5, since it's not fixing a regression.
        Hide
        Aaron McCurry added a comment -

        Yes

        Sent from my iPhone

        On Jan 20, 2010, at 5:42 AM, "Michael McCandless (JIRA)" <jira@apache.org

        Show
        Aaron McCurry added a comment - Yes Sent from my iPhone On Jan 20, 2010, at 5:42 AM, "Michael McCandless (JIRA)" <jira@apache.org
        Hide
        Michael McCandless added a comment -

        Aaron, are you also working on a version for the flex branch?

        Show
        Michael McCandless added a comment - Aaron, are you also working on a version for the flex branch?
        Hide
        Aaron McCurry added a comment - - edited

        Here's the last file. I have also back patched 3.0.0 and 2.9.1 and placed them on my blog incase you want to have a drop in replacement to try out.

        http://www.nearinfinity.com/blogs/aaron_mccurry/low_memory_patch_for_lucene.html

        Show
        Aaron McCurry added a comment - - edited Here's the last file. I have also back patched 3.0.0 and 2.9.1 and placed them on my blog incase you want to have a drop in replacement to try out. http://www.nearinfinity.com/blogs/aaron_mccurry/low_memory_patch_for_lucene.html
        Hide
        Aaron McCurry added a comment -

        The patch as it exists now. It no longer needs any mods to the Term.java file.

        Show
        Aaron McCurry added a comment - The patch as it exists now. It no longer needs any mods to the Term.java file.
        Hide
        Deepak added a comment -

        Hi Aaron

        I hope you will be able to post the files today

        Regards
        D

        Show
        Deepak added a comment - Hi Aaron I hope you will be able to post the files today Regards D
        Hide
        Aaron McCurry added a comment -

        Sure, I will post it all in an hour or so.

        Aaron

        Show
        Aaron McCurry added a comment - Sure, I will post it all in an hour or so. Aaron
        Hide
        Deepak added a comment -

        Hi Aaron

        I would like to test this patch on our environment. It seems patch you have supplied does not contain full version of Term.java

        Can you please suppy this class Term.java?

        Regards
        D

        Show
        Deepak added a comment - Hi Aaron I would like to test this patch on our environment. It seems patch you have supplied does not contain full version of Term.java Can you please suppy this class Term.java? Regards D
        Hide
        Aaron McCurry added a comment -

        Well to be honest, I spent a lot of time making the uni-code UTF-8/16/32 compare work, and work faster than the default implementation of TermInfosReader. I thought the same thing, but it didn't seem to work faster. I think that now that I have a working version as a baseline, I will go back and try some different things in the term.text compare.

        As far as "Another benefit doing this with flex is you can also change the index file format, ie write the vints to disk (so "build" is done at index time, not reader startup time), so the init time would be even faster."

        I actually have that implemented in our production system to give us an "instant on" capability when our huge indexes have to be reloaded. But I thought I would start simple for my contribution.

        Show
        Aaron McCurry added a comment - Well to be honest, I spent a lot of time making the uni-code UTF-8/16/32 compare work, and work faster than the default implementation of TermInfosReader. I thought the same thing, but it didn't seem to work faster. I think that now that I have a working version as a baseline, I will go back and try some different things in the term.text compare. As far as "Another benefit doing this with flex is you can also change the index file format, ie write the vints to disk (so "build" is done at index time, not reader startup time), so the init time would be even faster." I actually have that implemented in our production system to give us an "instant on" capability when our huge indexes have to be reloaded. But I thought I would start simple for my contribution.
        Hide
        Michael McCandless added a comment -

        Another benefit doing this with flex is you can also change the index file format, ie write the vints to disk (so "build" is done at index time, not reader startup time), so the init time would be even faster.

        Hmm... it's surprising you're seeing faster decode time – it looks like you read a vint per character of each index term compared, during the binary search? Vs String.compareTo done by trunk. (Though, if those characters are simple ascii, then the vint is always a single byte read).

        Actually, couldn't you simply compare the utf8 bytes (plus a "fixup", to match UTF16 sort order), which would require no per-character vint decode? (flex does this, since it holds the term data as utf8 bytes in memory).

        Show
        Michael McCandless added a comment - Another benefit doing this with flex is you can also change the index file format, ie write the vints to disk (so "build" is done at index time, not reader startup time), so the init time would be even faster. Hmm... it's surprising you're seeing faster decode time – it looks like you read a vint per character of each index term compared, during the binary search? Vs String.compareTo done by trunk. (Though, if those characters are simple ascii, then the vint is always a single byte read). Actually, couldn't you simply compare the utf8 bytes (plus a "fixup", to match UTF16 sort order), which would require no per-character vint decode? (flex does this, since it holds the term data as utf8 bytes in memory).
        Hide
        Aaron McCurry added a comment -

        Sure I can rework things for that. Not sure if read through my blog, but it
        seems as if the packing of the bytes as vints and such are actually faster
        than doing the java reference lookups. At least for the test cases that I
        have come up with. Plus the JVM's GC is much happier without having to
        traverse the object graph of terms.

        Over the next few days I will rework the api to StandardTermsIndexReader and
        repost a patch. Thanks.

        On Wed, Jan 13, 2010 at 1:36 PM, Michael McCandless (JIRA)

        Show
        Aaron McCurry added a comment - Sure I can rework things for that. Not sure if read through my blog, but it seems as if the packing of the bytes as vints and such are actually faster than doing the java reference lookups. At least for the test cases that I have come up with. Plus the JVM's GC is much happier without having to traverse the object graph of terms. Over the next few days I will rework the api to StandardTermsIndexReader and repost a patch. Thanks. On Wed, Jan 13, 2010 at 1:36 PM, Michael McCandless (JIRA)
        Hide
        Michael McCandless added a comment -

        That's very interesting – I like that approach. So it presumably has higher index/enumeration CPU cost but lower RAM cost. I think this is an OK tradeoff with terms dict, since "typically" the cost of the term lookup is dwarfed by the cost of iterating the postings (though, for TermRangeQuery on many low-freq terms, this doesn't hold true).

        In flex you could just implement StandardTermsIndexReader – the purpose of that abstract class is to allow pluggability of the terms index. Also, implementing this new approach there would be a good test case [that it's usable, generic enough, etc.]. You should be able to plug it in (temporarily make it the default, for the standard codec), then pass all unit tests...

        Show
        Michael McCandless added a comment - That's very interesting – I like that approach. So it presumably has higher index/enumeration CPU cost but lower RAM cost. I think this is an OK tradeoff with terms dict, since "typically" the cost of the term lookup is dwarfed by the cost of iterating the postings (though, for TermRangeQuery on many low-freq terms, this doesn't hold true). In flex you could just implement StandardTermsIndexReader – the purpose of that abstract class is to allow pluggability of the terms index. Also, implementing this new approach there would be a good test case [that it's usable, generic enough, etc.] . You should be able to plug it in (temporarily make it the default, for the standard codec), then pass all unit tests...
        Hide
        Aaron McCurry added a comment -

        My in memory implementation uses Vints and Vlongs, so they aren't that bad for storing the fileoffsets and blockpointer's.

        Show
        Aaron McCurry added a comment - My in memory implementation uses Vints and Vlongs, so they aren't that bad for storing the fileoffsets and blockpointer's.
        Hide
        Michael McCandless added a comment -

        I don't think anyone's run specific numbers on the terms dict – that'd be great if you get a chance to do so!

        Note that we plan also to cutover those arrays to packed ints (LUCENE-1990) since eg using a long for fileOffset and blockPointer is very wasteful in general. Even a short to record the length of each term is wasteful.

        Show
        Michael McCandless added a comment - I don't think anyone's run specific numbers on the terms dict – that'd be great if you get a chance to do so! Note that we plan also to cutover those arrays to packed ints ( LUCENE-1990 ) since eg using a long for fileOffset and blockPointer is very wasteful in general. Even a short to record the length of each term is wasteful.
        Hide
        Aaron McCurry added a comment -

        I took a look at that class, and it does look somewhat similar. Has anyone run any numbers against it to see that savings or performance of it? I'm at work now, but I may try it out in my tests when I get home.

        Show
        Aaron McCurry added a comment - I took a look at that class, and it does look somewhat similar. Has anyone run any numbers against it to see that savings or performance of it? I'm at work now, but I may try it out in my tests when I get home.
        Hide
        Michael McCandless added a comment -

        We've done something very similar to this, on the "flex" branch at https://svn.apache.org/repos/asf/lucene/java/branches/flex_1458 (which should land for Lucene 3.1), eg have a look at the oal.index.codecs.standard.SimpleStandardTermsIndexReader, in the inner CoreFieldIndex class, where we use single arrays to track the value for each indexed term.

        Show
        Michael McCandless added a comment - We've done something very similar to this, on the "flex" branch at https://svn.apache.org/repos/asf/lucene/java/branches/flex_1458 (which should land for Lucene 3.1), eg have a look at the oal.index.codecs.standard.SimpleStandardTermsIndexReader, in the inner CoreFieldIndex class, where we use single arrays to track the value for each indexed term.
        Hide
        Aaron McCurry added a comment -

        This is the program that used to get performance metrics. I have also attached the raw output of the test.

        Show
        Aaron McCurry added a comment - This is the program that used to get performance metrics. I have also attached the raw output of the test.
        Hide
        Aaron McCurry added a comment -

        All unit tests passed.

        Show
        Aaron McCurry added a comment - All unit tests passed.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Aaron McCurry
          • Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development