Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3524

Memory leak when not reusing avro schema instance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.2, 1.10.2
    • None
    • java
    • None
      • openJdk 8
      • tested in Avro 1.9.2 and 1.10.2

    Description

      When deserializing avro record, if we do not use shared schema instance, the memory usage start growing as the number of deserializing growth.

      Code with shared schema:

      public void myTest() throws Exception {
          Schema schema = new Schema.Parser().parse(schemaString);
          final AvroEntity avroEntity = buildAvroEntity();
          final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
          final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
          final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
          writer.write( avroEntity, encoder);
          encoder.flush();
          final byte[] data = outputStream.toByteArray();
          DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
      
          int count = 0;
          while (count < 100000) {
              final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
              //final Schema mySchema = new Schema.Parser().parse(schemaString);
              reader.setSchema(schema);
              reader.read(null, decoder);
              count++;
              if (count % 1000 == 0) {
                  System.gc();
                  System.out.println("test" + count);
              }
          }
          System.out.println("test" + count);
      }

       

      Code without shared schema:

      public void myTest() throws Exception {
          schema = new Schema.Parser().parse(schemaString);
          final AvroEntity avroEntity = buildAvroEntity();
          final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
          final BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
          final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
          writer.write( avroEntity, encoder);
          encoder.flush();
          final byte[] data = outputStream.toByteArray();
          DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
      
          int count = 0;
          while (count < 100000) {
              final Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
              final Schema mySchema = new Schema.Parser().parse(schemaString);
              reader.setSchema(mySchema);
              reader.read(null, decoder);
              count++;
              if (count % 1000 == 0) {
                  System.gc();
                  System.out.println("test" + count);
              }
          }
          System.out.println("test" + count);
      }

       

      Number of ConcurrentHashMapNode instances between shared schema and not-shared schema are 5,000 vs 1,500,000.

      Attachments

        1. jira-shared.png
          60 kB
          Yu-Wu Chu
        2. jira-not-share.png
          64 kB
          Yu-Wu Chu

        Activity

          People

            Unassigned Unassigned
            ywc999 Yu-Wu Chu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: