Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.8.2, 1.9.2
-
None
-
None
Description
When decoding a byte array using the Avro BinaryDecoder and SpecificDatumReader, is it possible to use the schema to check whether the input matches the definition before allocating memory buffer to process the data?
One bug we have in production is that we defined a type of payload that consists of two parts: the first part is a fixed size byte array and the second part is a record of variable-length strings. During the deserialization process, we'll extract the byte array first (using schema A) and then read out the strings (using schema B). However, we accidentally create a malformed payload that leaves out the byte array part. We assume Avro should throw out some kind of RuntimeException when decoding this malformed payload, but it ended up allocating a huge memory buffer scratchUtf8 to read the string and eventually cause a JVM OOM error on our end.
fixed MD5(16); // fixed length record A { MD5 hash; } record B { string name1; string name2; union {null, string} name3 = null; }