Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.9.0
-
None
-
None
-
Linux prim 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux
openjdk version "1.8.0_112"
OpenJDK Runtime Environment (build 1.8.0_112-b15)
OpenJDK 64-Bit Server VM (build 25.112-b15, mixed mode)
Description
When I run ParquetWriter.getDataSize(), it works normally. But after I call ParquetWriter.close(), subsequent calls to ParquetWriter.getDataSize result in a NullPointerException.
java.lang.NullPointerException at org.apache.parquet.hadoop.InternalParquetRecordWriter.getDataSize(InternalParquetRecordWriter.java:132) at org.apache.parquet.hadoop.ParquetWriter.getDataSize(ParquetWriter.java:314) at FileBufferState.getFileSizeInBytes(FileBufferState.scala:83)
The reason for the NPE appears to be in InternalParquetRecordWriter.getDataSize, where it assumes that columnStore is not null.
But the close() method calls flushRowGroupToStore() which sets columnStore = null.
I'm guessing that once the file is closed, we can just return lastRowGroupEndPos since there should be no more buffered data, but I don't fully understand how this class works.
Attachments
Issue Links
- relates to
-
PARQUET-308 Add accessor to ParquetWriter to get current data size
- Resolved