Lucene - Core
  1. Lucene - Core
  2. LUCENE-5836

BytesRef.copyBytes and copyChars don't oversize

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When copying data from another BytesRef/CharSequence, these methods don't oversize. This is not an issue if this method is used only once per BytesRef instance but I just reviewed the usage of these methods and they are very frequently used in loops to do things like:

      • keep track of the top values in comparators
      • keep track of the previous terms in various loops over a terms enum (lucene49 DV consumer, BlockTreeTermsWriter)
      • etc.

      Although unlikely, it might be possible to hit a worst-case and to resize the underlying byte[] on every call to copyBytes? Should we oversize the underlying array in these methods?

      1. LUCENE-5836.patch
        10 kB
        Adrien Grand

        Activity

        Hide
        Robert Muir added a comment -

        I'm not sure we should encourage the stringbuffer usage of these things. Maybe copyBytes should just be a front-end for system.arraycopy and the user has to ensure allocation themself.

        Show
        Robert Muir added a comment - I'm not sure we should encourage the stringbuffer usage of these things. Maybe copyBytes should just be a front-end for system.arraycopy and the user has to ensure allocation themself.
        Hide
        Adrien Grand added a comment -

        So the idea would be to change copyBytes to just copy bytes and fix call sites to call BytesRef.grow before copyBytes if necessary?

        Show
        Adrien Grand added a comment - So the idea would be to change copyBytes to just copy bytes and fix call sites to call BytesRef.grow before copyBytes if necessary?
        Hide
        Robert Muir added a comment -

        The problem is not unique to copyBytes.

        copyBytes/append/grow/copyChars are all stringbuffer-type methods.

        I really think we should remove/discourage these: because bytesref is really crap for a stringbuffer type object since it has no safety. You cant be a "reference to array with offset" and also be this: its just horrible software design!!!!

        Show
        Robert Muir added a comment - The problem is not unique to copyBytes. copyBytes/append/grow/copyChars are all stringbuffer-type methods. I really think we should remove/discourage these: because bytesref is really crap for a stringbuffer type object since it has no safety. You cant be a "reference to array with offset" and also be this: its just horrible software design!!!!
        Hide
        Adrien Grand added a comment -

        I'd like to fix it as well but this would be a very big change. In the mean time would you agree to fix copyBytes to oversize the destination array to make sure that we don't hit the worst-case?

        Show
        Adrien Grand added a comment - I'd like to fix it as well but this would be a very big change. In the mean time would you agree to fix copyBytes to oversize the destination array to make sure that we don't hit the worst-case?
        Hide
        Robert Muir added a comment -

        No, because there is no indication it would ever be reused: it could just be creating waste.

        Show
        Robert Muir added a comment - No, because there is no indication it would ever be reused: it could just be creating waste.
        Hide
        ASF subversion and git services added a comment -

        Commit 1611970 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1611970 ]

        LUCENE-5836: when prefix-coding variable length terms, preallocate lastTerm to the correct size

        Show
        ASF subversion and git services added a comment - Commit 1611970 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1611970 ] LUCENE-5836 : when prefix-coding variable length terms, preallocate lastTerm to the correct size
        Hide
        ASF subversion and git services added a comment -

        Commit 1611971 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1611971 ]

        LUCENE-5836: when prefix-coding variable length terms, preallocate lastTerm to the correct size

        Show
        ASF subversion and git services added a comment - Commit 1611971 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1611971 ] LUCENE-5836 : when prefix-coding variable length terms, preallocate lastTerm to the correct size
        Hide
        Robert Muir added a comment -

        keep track of the previous terms in various loops over a terms enum (lucene49 DV consumer,

        Thanks for pointing this out, i fixed this one (it knows the maximum size before the loop)

        Show
        Robert Muir added a comment - keep track of the previous terms in various loops over a terms enum (lucene49 DV consumer, Thanks for pointing this out, i fixed this one (it knows the maximum size before the loop)
        Hide
        Adrien Grand added a comment -

        And what if we move the StringBuilder-like methods to a different class? (see eg. attached patch which doesn't compile but should give an idea of the change) Would that work for you?

        Show
        Adrien Grand added a comment - And what if we move the StringBuilder-like methods to a different class? (see eg. attached patch which doesn't compile but should give an idea of the change) Would that work for you?
        Hide
        Robert Muir added a comment -

        that would be a fantastic improvement IMO.

        Show
        Robert Muir added a comment - that would be a fantastic improvement IMO.
        Hide
        Adrien Grand added a comment -

        OK, I'll give it a try.

        Show
        Adrien Grand added a comment - OK, I'll give it a try.
        Hide
        Michael McCandless added a comment -

        +1 for BytesRefBuilder!

        Show
        Michael McCandless added a comment - +1 for BytesRefBuilder!
        Hide
        Adrien Grand added a comment -

        BytesRef is not a buffer.

        Show
        Adrien Grand added a comment - BytesRef is not a buffer.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development