Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
A followup of PARQUET-1947, after the fix, when the first sub-split is empty in CombineFileInputFormat, there's a NPE:
Caused by: java.lang.NullPointerException at org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154) at org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73) at cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70) at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61) at org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160) at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163) at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136) ... 10 more
The reason is CombineFileInputFormat will use the result of createValue of the first sub-split as the value container. Since the first sub-split is empty, the value container is null.