Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2166

parquet writer runs into OOM during writing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.10.1, 1.12.1
    • None
    • parquet-avro
    • None

    Description

      Hi team,
      We are getting OOM error on trying to writer data to the parquet file. Please check below stack trace:

      Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at org.apache.parquet.hadoop.codec.SnappyCompressor.setInput(SnappyCompressor.java:97) at org.apache.parquet.hadoop.codec.NonBlockedCompressorStream.write(NonBlockedCompressorStream.java:48) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:227) at org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:247) at org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:405) at org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:296) at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:164) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147) at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172) at org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at com.fivetran.warehouses.common.parquet.AvroBasedParquetWriterAdapter.write(AvroBasedParquetWriterAdapter.java:39)

      We believe that most of the memory is being consumed by slabs. From below warning we can see that a content column acquired 108 slabs:

      [content] optional binary content (UTF8) { r:0 d: RunLengthBitPackingHybrid 64 bytes data: FallbackValuesWriter{ data: initial: DictionaryValuesWriter

      Unknown macro: { data}

      data: fallback: PLAIN CapacityByteArrayOutputStream 108 slabs, 162,188,576 bytes data:} pages: ColumnChunkPageWriter ConcatenatingByteArrayCollector 0 slabs, 0 bytes total: 162,188,590/162,188,640 }

      Could you please help us resolve this issue?
      Thanks

      Attachments

        Activity

          People

            Unassigned Unassigned
            ketki.bukkawar Ketki Bukkawar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: