[AVRO-1041] Utf8 allocates new byte array unnessisarily - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.2
Fix Version/s: 1.6.3
Component/s: java
Labels:
None

Description

When a Utf8 instance is about to receive new data (i.e. in BinaryDecoder), Utf8::setByteLength is invoked to essentially ensure capacity of the backing byte array.
However, the logical length of the current instance is compared against the required size rather than the existing byte array size.
This causes needless allocations of a new backing byte array: If you read a 10 byte string followed by an 8 byte string followed by a 9 byte string, the 3rd read will cause a new backing array allocation even though the instance already has a 10 byte array at its disposal.
At a minimum we should replace:

  public Utf8 setByteLength(int newLength) {
    if (this.length < newLength) {
      byte[] newBytes = new byte[newLength];
      System.arraycopy(bytes, 0, newBytes, 0, this.length);
      this.bytes = newBytes;
    }
    ...
  }

with:

  public Utf8 setByteLength(int newLength) {
    if (this.bytes.length < newLength) {
      byte[] newBytes = new byte[newLength];
      System.arraycopy(bytes, 0, newBytes, 0, this.length);
      this.bytes = newBytes;
    }
    ...
  }

We may also wish to consider setting a maximum size limit to the utf8 instance: If we allocate over this, we drop the backing array the next time we get a resize for a data length smaller than this (so we aren't forced to keep memory for the largest utf8 encountered in memory).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AVRO-1041.patch
02/Mar/12 09:28
2 kB
dave irving

Activity

People

Assignee:: dave irving

Reporter:: dave irving

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 01/Mar/12 21:32

Updated:: 19/Mar/12 16:34

Resolved:: 02/Mar/12 21:50