Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-282

OutOfMemoryError in job commit / ParquetMetadataConverter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.6.0
    • None
    • parquet-mr
    • None
    • CentOS, MapR,. Scalding

    Description

      We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet files. When our ETL job finishes, it throws this exception, and the status is "died in job commit".

      2015-05-14 09:24:28,158 FATAL CommitterEvent Processor #4 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread ThreadCommitterEvent Processor #4,5,main threw an Error. Shutting down now...
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
      at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
      at parquet.format.Statistics.setMin(Statistics.java:237)
      at parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:243)
      at parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
      at parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
      at parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
      at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
      at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
      at parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
      at parquet.hadoop.mapred.MapredParquetOutputCommitter.commitJob(MapredParquetOutputCommitter.java:43)
      at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:253)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      This seems to have something to do with the _metadata file creation, as the parquet files are perfectly fine and usable. Also I'm not sure how to alleviate this (i.e. add more heap space) since the crash is outside the Map/Reduce tasks themselves but seems in the job/application controller itself.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hy5446 hy5446
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: