Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-860

ParquetWriter.getDataSize NullPointerException after closed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.0
    • None
    • parquet-mr
    • None
    • Linux prim 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux

      openjdk version "1.8.0_112"
      OpenJDK Runtime Environment (build 1.8.0_112-b15)
      OpenJDK 64-Bit Server VM (build 25.112-b15, mixed mode)

    Description

      When I run ParquetWriter.getDataSize(), it works normally. But after I call ParquetWriter.close(), subsequent calls to ParquetWriter.getDataSize result in a NullPointerException.

      java.lang.NullPointerException
      	at org.apache.parquet.hadoop.InternalParquetRecordWriter.getDataSize(InternalParquetRecordWriter.java:132)
      	at org.apache.parquet.hadoop.ParquetWriter.getDataSize(ParquetWriter.java:314)
      	at FileBufferState.getFileSizeInBytes(FileBufferState.scala:83)
      

      The reason for the NPE appears to be in InternalParquetRecordWriter.getDataSize, where it assumes that columnStore is not null.

      But the close() method calls flushRowGroupToStore() which sets columnStore = null.

      I'm guessing that once the file is closed, we can just return lastRowGroupEndPos since there should be no more buffered data, but I don't fully understand how this class works.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mikemintz Mike Mintz
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: