-
Type:
Bug
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 1.8.1
-
Component/s: parquet-mr
-
Labels:None
org.apache.parquet.hadoop.ParquetWriter implements java.util.Closeable, but its close() method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but ParquetWriter instead throws NullPointerException.
It's source is quite obvious, columnStore is set to null and then accessed again. There is no "if already closed" condition to prevent it.
java.lang.NullPointerException: null at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1] at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1] at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
private void flushRowGroupToStore() throws IOException { LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize())); if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) { LOG.warn("Too much memory used: " + columnStore.memUsageString()); } if (recordCount > 0) { parquetFileWriter.startBlock(recordCount); columnStore.flush(); pageStore.flushToFileWriter(parquetFileWriter); recordCount = 0; parquetFileWriter.endBlock(); this.nextRowGroupSize = Math.min( parquetFileWriter.getNextRowGroupSize(), rowGroupSizeThreshold); } columnStore = null; pageStore = null; }
Known workaround is to prevent the second and other closes explicitly in the application code.
private final ParquetWriter<V> writer; private boolean closed; private void closeWriterOnlyOnce() throws IOException { if (!closed) { closed = true; writer.close(); } }