[AVRO-2048] Avro Binary Decoding - Gracefully Handle Long Strings - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.7.7, 1.8.2
Fix Version/s: 1.9.0
Component/s: java
Labels:
None

Flags:

Patch

Description

According to the specs:

a string is encoded as a long followed by that many bytes of UTF-8 encoded character data.

However, that is currently not being adhered to:

org.apache.avro.io.BinaryDecoder

  @Override
  public Utf8 readString(Utf8 old) throws IOException {
    int length = readInt();
    Utf8 result = (old != null ? old : new Utf8());
    result.setByteLength(length);
    if (0 != length) {
      doReadBytes(result.getBytes(), 0, length);
    }
    return result;
  }

The first thing the code does here is to load an int value, not a long. Because of the variable length nature of the size, this will mostly work. However, there may be edge-cases where the serializer is putting in large length values erroneously or nefariously. Let us gracefully detect such scenarios and more closely adhere to the spec.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AVRO-2048.1.patch
14/Jul/17 21:21
3 kB
David Mollitor
AVRO-2048.2.patch
21/Jul/17 12:55
3 kB
David Mollitor
AVRO-2048.3.patch
26/Jul/17 15:19
3 kB
David Mollitor

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Jul/17 21:20

Updated:: 28/Jul/17 07:18

Resolved:: 28/Jul/17 07:18