Lucene - Core
  1. Lucene - Core
  2. LUCENE-2556

(Char)TermAttribute cloning memory consumption

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      The memory consumption problem with cloning a (Char)TermAttributeImpl object was raised on thread http://markmail.org/thread/bybuerugbk5w2u6z

      1. CharTermAttributeMemoryConsumptionDemo.java
        1 kB
        Adriano Crestani
      2. lucene_2556_adriano_crestani_07_23_2010.patch
        0.9 kB
        Adriano Crestani
      3. LUCENE-2556.patch
        0.9 kB
        Uwe Schindler
      4. lucene_2556_adriano_crestani_07_24_2010.patch
        0.8 kB
        Adriano Crestani
      5. LUCENE-2556-3.0.patch
        4 kB
        Uwe Schindler

        Activity

        Hide
        Adriano Crestani added a comment -

        This java application demonstrates how much memory CharTermAttributeImpl.clone() might consume in some scenarios.

        Show
        Adriano Crestani added a comment - This java application demonstrates how much memory CharTermAttributeImpl.clone() might consume in some scenarios.
        Hide
        Adriano Crestani added a comment -

        This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size.

        It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...).

        Show
        Adriano Crestani added a comment - This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size. It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...).
        Hide
        Uwe Schindler added a comment -

        Here the patch, I see no problem with applying it to 3.x and trunk.

        Show
        Uwe Schindler added a comment - Here the patch, I see no problem with applying it to 3.x and trunk.
        Hide
        Uwe Schindler added a comment -

        This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size.
        It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...).

        Haven't seen your patch yet. I dont know if the two extra calculations rectify the barnching, because terms are mostly short...

        If we take your patch, the allocations should in all cases be done with ArrayUtils.oversize() to be consistent with the allocation strategy of the rest of CTA.

        Show
        Uwe Schindler added a comment - This patch optimizes the cloning of the CharTermAttributeImpl internal buffer. It keeps using clone() to clone the internal buffer when CharTermAttribute.length() is at least 150 and at least 75% and of the internal buffer length, otherwise, it uses System.arrayCopy(...) to clone it using CharTermAttribute.length() as the new internal buffer size. It's performing the optimization, because in some scenarios, like cloning long arrays, clone() is usually faster than System.arrayCopy(...). Haven't seen your patch yet. I dont know if the two extra calculations rectify the barnching, because terms are mostly short... If we take your patch, the allocations should in all cases be done with ArrayUtils.oversize() to be consistent with the allocation strategy of the rest of CTA.
        Hide
        Adriano Crestani added a comment -

        Hi Uwe,

        Thanks for the quick reply, your patch looks good enough for me. I just added that optimization part in case somebody complains about poor performance of arrayCopy for long arrays, but I agree with you, mostly terms are short and wouldn't require such optimization.

        +1 to also apply the patch to trunk

        Show
        Adriano Crestani added a comment - Hi Uwe, Thanks for the quick reply, your patch looks good enough for me. I just added that optimization part in case somebody complains about poor performance of arrayCopy for long arrays, but I agree with you, mostly terms are short and wouldn't require such optimization. +1 to also apply the patch to trunk
        Hide
        Adriano Crestani added a comment -

        I was checking State.clone() method usage and it's just used store the current AttributeSource state for later use, when it's restored by invoking AttributeSource.restoreState(), which only copies the valid chars from the stored state to the current CharTermAttribute object the AttributeSource holds. In the end, I see no reason for stored states (the cloned ones) to hold an internal buffer greater than it needs too, once it will never be actually used/changed by any AttributeSource user.

        Please, let me know if I'm missing something.

        Thanks,
        Adriano Crestani

        Show
        Adriano Crestani added a comment - I was checking State.clone() method usage and it's just used store the current AttributeSource state for later use, when it's restored by invoking AttributeSource.restoreState(), which only copies the valid chars from the stored state to the current CharTermAttribute object the AttributeSource holds. In the end, I see no reason for stored states (the cloned ones) to hold an internal buffer greater than it needs too, once it will never be actually used/changed by any AttributeSource user. Please, let me know if I'm missing something. Thanks, Adriano Crestani
        Hide
        Adriano Crestani added a comment -

        So, if you guys think my thoughts above are OK, this is a simple patch with code that creates a new internal buffer strictly equals to the term length

        Show
        Adriano Crestani added a comment - So, if you guys think my thoughts above are OK, this is a simple patch with code that creates a new internal buffer strictly equals to the term length
        Hide
        Robert Muir added a comment -

        Uwe, whats happening with this issue?

        Indexing terms are typically small, I think we should commit Adriano's last patch.
        I have problems with clone() being slow with -client on my jvm, I think this is a good improvement.

        And, can prevent some memory issues (the original intent of the issue it seems)

        Show
        Robert Muir added a comment - Uwe, whats happening with this issue? Indexing terms are typically small, I think we should commit Adriano's last patch. I have problems with clone() being slow with -client on my jvm, I think this is a good improvement. And, can prevent some memory issues (the original intent of the issue it seems)
        Hide
        Uwe Schindler added a comment -

        I have no problem with it.

        The only thing: It still thin it should use an array size calculated by ArrayUtils.oversize() on the clone?

        Show
        Uwe Schindler added a comment - I have no problem with it. The only thing: It still thin it should use an array size calculated by ArrayUtils.oversize() on the clone?
        Hide
        Robert Muir added a comment -

        The only thing: It still thin it should use an array size calculated by ArrayUtils.oversize() on the clone?

        I don't think we should. the clone might never be used again:

        So if the clone is never reused, there is no evidence the array will ever grow (most analysis processes, stemming and folding, etc actually shorten the term text)

        Show
        Robert Muir added a comment - The only thing: It still thin it should use an array size calculated by ArrayUtils.oversize() on the clone? I don't think we should. the clone might never be used again: So if the clone is never reused, there is no evidence the array will ever grow (most analysis processes, stemming and folding, etc actually shorten the term text)
        Hide
        Adriano Crestani added a comment -

        Robert is right, in the actual use case, a cloned TermAttribute is never modified, it's at max copied back to the TokenStream's TermAttribute (when using TokenStream.restoreState(State)), so the only TermAttribute instance that needs to grow is the TokenStream's.

        Show
        Adriano Crestani added a comment - Robert is right, in the actual use case, a cloned TermAttribute is never modified, it's at max copied back to the TokenStream's TermAttribute (when using TokenStream.restoreState(State)), so the only TermAttribute instance that needs to grow is the TokenStream's.
        Hide
        Uwe Schindler added a comment -

        There is also no problem with reusing the cloned attribute, as ArrayUtils grows the array correctly also in that case. So you can still apend stuff as usual, with a small perf penalty maybe.

        I think I will commit this later this evening to trunk and 3.x. Merging is little harder because of sophisticated backwards layer in 3.x.

        Show
        Uwe Schindler added a comment - There is also no problem with reusing the cloned attribute, as ArrayUtils grows the array correctly also in that case. So you can still apend stuff as usual, with a small perf penalty maybe. I think I will commit this later this evening to trunk and 3.x. Merging is little harder because of sophisticated backwards layer in 3.x.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1024408
        Committed 3.x revision: 1024409

        Thanks Adriano!

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1024408 Committed 3.x revision: 1024409 Thanks Adriano!
        Hide
        Adriano Crestani added a comment -

        Thank you for committing the patch

        Show
        Adriano Crestani added a comment - Thank you for committing the patch
        Hide
        Uwe Schindler added a comment -

        Backport to 2.9 and 3.0.

        Show
        Uwe Schindler added a comment - Backport to 2.9 and 3.0.
        Hide
        Uwe Schindler added a comment -

        Chaning title

        Show
        Uwe Schindler added a comment - Chaning title
        Hide
        Uwe Schindler added a comment -

        Patch for 3.0 for reference

        Show
        Uwe Schindler added a comment - Patch for 3.0 for reference
        Hide
        Uwe Schindler added a comment - - edited

        Committed branch 3.0 revision: 1029022
        Committed branch 2.9 revision: 1029027

        Show
        Uwe Schindler added a comment - - edited Committed branch 3.0 revision: 1029022 Committed branch 2.9 revision: 1029027

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Adriano Crestani
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development