Avro
  1. Avro
  2. AVRO-1045

deepCopy of BYTES underflow exception

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.2
    • Fix Version/s: 1.7.0
    • Component/s: java
    • Labels:
      None

      Description

      In org.apache.avro.generic.GenericData.deepCopy - the code for copying a ByteBuffer is
      ByteBuffer byteBufferValue = (ByteBuffer) value;
      byte[] bytesCopy = new byte[byteBufferValue.capacity()];
      byteBufferValue.rewind();
      byteBufferValue.get(bytesCopy);
      byteBufferValue.rewind();
      return ByteBuffer.wrap(bytesCopy);

      I think this is problematic because it will cause an UnderFlow exception to be thrown if the ByteBuffer limit is less than the capacity of the byte buffer.

      My use case is as follows. I have ByteBuffer's backed by large arrays so I can avoid resizing the array every time I write data. So limit < capacity. When the data is written, or copied
      I think avro should respect this. When data is serialized, avro should automatically use the minimum number of bytes.
      When an object is copied, I think it makes sense to preserve the capacity of the underlying buffer as opposed to compacting it.

      So I think the code could be fixed by replacing get with
      byteBufferValue.get(bytesCopy, 0 , byteBufferValue.limit());

      1. AVRO-1045.patch
        2 kB
        Jeremy Lewi
      2. AVRO-1045.patch
        2 kB
        Doug Cutting

        Activity

        Hide
        Doug Cutting added a comment -

        I committed this.

        Show
        Doug Cutting added a comment - I committed this.
        Hide
        Jeremy Lewi added a comment -

        Thanks for fixing the patch Doug. This will work for me.

        J

        On Thu, Mar 15, 2012 at 6:18 PM, Scott Carey (Commented) (JIRA) <

        Show
        Jeremy Lewi added a comment - Thanks for fixing the patch Doug. This will work for me. J On Thu, Mar 15, 2012 at 6:18 PM, Scott Carey (Commented) (JIRA) <
        Hide
        Scott Carey added a comment -

        +1 Looks good. The other test should not worry about the capacity being the same.

        If users need a copy that does not compact, I'd be fine accepting a patch that added that as an option or extensibility point.

        Show
        Scott Carey added a comment - +1 Looks good. The other test should not worry about the capacity being the same. If users need a copy that does not compact, I'd be fine accepting a patch that added that as an option or extensibility point.
        Hide
        Doug Cutting added a comment -

        Looks like Jeremy already answered my question: compaction is okay with him.

        Here's a patch that implements Scott's second proposal. The new test fails without the patch but passes with it. With the patch one test failed, TestSpecificRecordBuilder, which explicitly checked that deepCopy() didn't change a buffer's capacity. My inclination is to call this a bad test and remove it.

        Show
        Doug Cutting added a comment - Looks like Jeremy already answered my question: compaction is okay with him. Here's a patch that implements Scott's second proposal. The new test fails without the patch but passes with it. With the patch one test failed, TestSpecificRecordBuilder, which explicitly checked that deepCopy() didn't change a buffer's capacity. My inclination is to call this a bad test and remove it.
        Hide
        Doug Cutting added a comment -

        In the second case capacity==limit, right?

        Would the second be acceptable to you Jeremy, or do you require that the capacity, etc. be preserved too?

        Show
        Doug Cutting added a comment - In the second case capacity==limit, right? Would the second be acceptable to you Jeremy, or do you require that the capacity, etc. be preserved too?
        Hide
        Scott Carey added a comment -

        There are two choices that make sense to me:

        • Copy the whole buffer and set all positions in the destination to the same thing as the source (limit, pos, mark, capacity), being cognizant of arrayOffset in case the source buffer was a slice of a larger array.
        • Assume that the data of intereste is between pos and limit, copy that into a new byte buffer starting at index 0 with the new limit set to (limit - pos).
          In both cases, the original buffer needs to be returned to its original state.

        Avro isn't currently doing either.

        Show
        Scott Carey added a comment - There are two choices that make sense to me: Copy the whole buffer and set all positions in the destination to the same thing as the source (limit, pos, mark, capacity), being cognizant of arrayOffset in case the source buffer was a slice of a larger array. Assume that the data of intereste is between pos and limit, copy that into a new byte buffer starting at index 0 with the new limit set to (limit - pos). In both cases, the original buffer needs to be returned to its original state. Avro isn't currently doing either.
        Hide
        Jeremy Lewi added a comment -

        I think the issue is less my use case then maintaining compatibility with existing code. In my patch, I decided to preserve the capacity on copy rather than compact because this seemed slightly less likely to break existing code.
        The current implementation creates a new buffer of equal capacity, so in theory its not compacting the buffer. However, if the buffer isn't full, i.e. limit < capacity, then the current code causes an exception. So based on the current implementation, its unclear whether compacting or not compacting is the expected behavior.

        With regards to my use case, compacting on copy is fine, so long as deep copy doesn't expect limit = capacity like it currently does.

        Show
        Jeremy Lewi added a comment - I think the issue is less my use case then maintaining compatibility with existing code. In my patch, I decided to preserve the capacity on copy rather than compact because this seemed slightly less likely to break existing code. The current implementation creates a new buffer of equal capacity, so in theory its not compacting the buffer. However, if the buffer isn't full, i.e. limit < capacity, then the current code causes an exception. So based on the current implementation, its unclear whether compacting or not compacting is the expected behavior. With regards to my use case, compacting on copy is fine, so long as deep copy doesn't expect limit = capacity like it currently does.
        Hide
        Doug Cutting added a comment -

        It's a little odd to require that deepCopy() preserve more than is checked by equals(). Some folks may might reasonably expect deepCopy() to compact large ByteBuffers. We could perhaps add a 'protected ByteBuffer GenericData#copyBytes(ByteBuffer)' method that could be overridden in a subclass? Would that work in your case? Am I being overly cautious?

        Show
        Doug Cutting added a comment - It's a little odd to require that deepCopy() preserve more than is checked by equals(). Some folks may might reasonably expect deepCopy() to compact large ByteBuffers. We could perhaps add a 'protected ByteBuffer GenericData#copyBytes(ByteBuffer)' method that could be overridden in a subclass? Would that work in your case? Am I being overly cautious?
        Hide
        Jeremy Lewi added a comment -

        The patch includes a new test case in the unittest.

        Show
        Jeremy Lewi added a comment - The patch includes a new test case in the unittest.

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Jeremy Lewi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development