Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
1.10.1, 1.12.1
-
None
-
None
Description
Hi team,
We are getting OOM error on trying to writer data to the parquet file. Please check below stack trace:
Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at org.apache.parquet.hadoop.codec.SnappyCompressor.setInput(SnappyCompressor.java:97) at org.apache.parquet.hadoop.codec.NonBlockedCompressorStream.write(NonBlockedCompressorStream.java:48) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:227) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:247) at org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:405) at org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:296) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:164) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147) at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172) at org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at com.fivetran.warehouses.common.parquet.AvroBasedParquetWriterAdapter.write(AvroBasedParquetWriterAdapter.java:39)
We believe that most of the memory is being consumed by slabs. From below warning we can see that a content column acquired 108 slabs:
[content] optional binary content (UTF8) { r:0 d: RunLengthBitPackingHybrid 64 bytes data: FallbackValuesWriter{ data: initial: DictionaryValuesWriter
Unknown macro: { data}data: fallback: PLAIN CapacityByteArrayOutputStream 108 slabs, 162,188,576 bytes data:} pages: ColumnChunkPageWriter ConcatenatingByteArrayCollector 0 slabs, 0 bytes total: 162,188,590/162,188,640 }
Could you please help us resolve this issue?
Thanks