Description
According to the specs:
a string is encoded as a long followed by that many bytes of UTF-8 encoded character data.
However, that is currently not being adhered to:
org.apache.avro.io.BinaryDecoder
@Override public Utf8 readString(Utf8 old) throws IOException { int length = readInt(); Utf8 result = (old != null ? old : new Utf8()); result.setByteLength(length); if (0 != length) { doReadBytes(result.getBytes(), 0, length); } return result; }
The first thing the code does here is to load an int value, not a long. Because of the variable length nature of the size, this will mostly work. However, there may be edge-cases where the serializer is putting in large length values erroneously or nefariously. Let us gracefully detect such scenarios and more closely adhere to the spec.