[AVRO-3524] Memory leak when not reusing avro schema instance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.9.2, 1.10.2
Fix Version/s: None
Component/s: java
Labels:
None
Environment:
- openJdk 8
- tested in Avro 1.9.2 and 1.10.2

Language:
- Java

Description

When deserializing avro record, if we do not use shared schema instance, the memory usage start growing as the number of deserializing growth.

Code with shared schema:

public void myTest() throws Exception {
    Schema schema = new Schema.Parser().parse(schemaString);
    final AvroEntity avroEntity = buildAvroEntity();
    final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
    writer.write( avroEntity, encoder);
    encoder.flush();
    final byte[] data = outputStream.toByteArray();
    DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);

    int count = 0;
    while (count < 100000) {
        final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
        //final Schema mySchema = new Schema.Parser().parse(schemaString);
        reader.setSchema(schema);
        reader.read(null, decoder);
        count++;
        if (count % 1000 == 0) {
            System.gc();
            System.out.println("test" + count);
        }
    }
    System.out.println("test" + count);
}

Code without shared schema:

public void myTest() throws Exception {
    schema = new Schema.Parser().parse(schemaString);
    final AvroEntity avroEntity = buildAvroEntity();
    final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
    writer.write( avroEntity, encoder);
    encoder.flush();
    final byte[] data = outputStream.toByteArray();
    DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);

    int count = 0;
    while (count < 100000) {
        final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
        final Schema mySchema = new Schema.Parser().parse(schemaString);
        reader.setSchema(mySchema);
        reader.read(null, decoder);
        count++;
        if (count % 1000 == 0) {
            System.gc();
            System.out.println("test" + count);
        }
    }
    System.out.println("test" + count);
}

Number of ConcurrentHashMapNode instances between shared schema and not-shared schema are 5,000 vs 1,500,000.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jira-shared.png
24/May/22 23:08
60 kB
Yu-Wu Chu
jira-not-share.png
24/May/22 23:08
64 kB
Yu-Wu Chu

Activity

People

Assignee:: Unassigned

Reporter:: Yu-Wu Chu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/May/22 23:05

Updated:: 14/Nov/22 12:05