Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-544

ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.8.1
    • Fix Version/s: 1.9.0, 1.8.2
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      org.apache.parquet.hadoop.ParquetWriter implements java.util.Closeable, but its close() method doesn't follow its contract properly. The interface defines "If the stream is already closed then invoking this method has no effect.", but ParquetWriter instead throws NullPointerException.

      It's source is quite obvious, columnStore is set to null and then accessed again. There is no "if already closed" condition to prevent it.

      java.lang.NullPointerException: null
      	at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      	at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      
        private void flushRowGroupToStore()
            throws IOException {
          LOG.info(format("Flushing mem columnStore to file. allocated memory: %,d", columnStore.getAllocatedSize()));
          if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) {
            LOG.warn("Too much memory used: " + columnStore.memUsageString());
          }
      
          if (recordCount > 0) {
            parquetFileWriter.startBlock(recordCount);
            columnStore.flush();
            pageStore.flushToFileWriter(parquetFileWriter);
            recordCount = 0;
            parquetFileWriter.endBlock();
            this.nextRowGroupSize = Math.min(
                parquetFileWriter.getNextRowGroupSize(),
                rowGroupSizeThreshold);
          }
      
          columnStore = null;
          pageStore = null;
        }
      

      Known workaround is to prevent the second and other closes explicitly in the application code.

          private final ParquetWriter<V> writer;
          private boolean closed;
      
          private void closeWriterOnlyOnce() throws IOException {
              if (!closed) {
                  closed = true;
                  writer.close();
              }
          }
      

        Attachments

          Activity

            People

            • Assignee:
              turek@avast.com Michal Turek
              Reporter:
              turek@avast.com Michal Turek
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: