Details
Description
When deserializing avro record, if we do not use shared schema instance, the memory usage start growing as the number of deserializing growth.
Code with shared schema:
public void myTest() throws Exception { Schema schema = new Schema.Parser().parse(schemaString); final AvroEntity avroEntity = buildAvroEntity(); final ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null); final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema); writer.write( avroEntity, encoder); encoder.flush(); final byte[] data = outputStream.toByteArray(); DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema); int count = 0; while (count < 100000) { final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null); //final Schema mySchema = new Schema.Parser().parse(schemaString); reader.setSchema(schema); reader.read(null, decoder); count++; if (count % 1000 == 0) { System.gc(); System.out.println("test" + count); } } System.out.println("test" + count); }
Code without shared schema:
public void myTest() throws Exception { schema = new Schema.Parser().parse(schemaString); final AvroEntity avroEntity = buildAvroEntity(); final ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null); final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema); writer.write( avroEntity, encoder); encoder.flush(); final byte[] data = outputStream.toByteArray(); DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema); int count = 0; while (count < 100000) { final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null); final Schema mySchema = new Schema.Parser().parse(schemaString); reader.setSchema(mySchema); reader.read(null, decoder); count++; if (count % 1000 == 0) { System.gc(); System.out.println("test" + count); } } System.out.println("test" + count); }
Number of ConcurrentHashMapNode instances between shared schema and not-shared schema are 5,000 vs 1,500,000.