Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1818

Avoid buffer copy in DeflateCodec.compress and decompress

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • java
    • None

    Description

      One of our jobs reading avro hit OOM due to the buffer copy in compress and decompress methods which is very inefficient.

      https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java#L71-L86

      java.lang.OutOfMemoryError: Java heap space
      	at java.util.Arrays.copyOf(Arrays.java:3236)
      	at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
      	at org.apache.avro.file.DeflateCodec.decompress(DeflateCodec.java:84)
      

      I would suggest using a class that extends ByteArrrayOutputStream like https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java#L51-L53

      and do
      ByteBuffer result = ByteBuffer.wrap(buf.getData(), 0, buf.getLength());

      Attachments

        Issue Links

          Activity

            People

              nkollar Nándor Kollár
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: