Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.12.0
-
None
Description
I have parquet files created using 1.11.1. In the process I join two files (with the same schema) into a one output file. I create Hadoop writer:
val hadoopWriter = new ParquetFileWriter( HadoopOutputFile.fromPath( new Path(outputPath.toString), new Configuration() ), outputSchema, Mode.OVERWRITE, 8 * 1024 * 1024, 2097152, DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH, DEFAULT_STATISTICS_TRUNCATE_LENGTH, DEFAULT_PAGE_WRITE_CHECKSUM_ENABLED ) hadoopWriter.start()
and try to append one file into another:
hadoopWriter.appendFile(HadoopInputFile.fromPath(new Path(file), new Configuration()))
Everything works on 1.11.1. But when I've switched to 1.12.0 it fails with that error:
STDERR: Exception in thread "main" java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@b91d8c4 at org.apache.parquet.format.Util.read(Util.java:365) at org.apache.parquet.format.Util.readPageHeader(Util.java:132) at org.apache.parquet.format.Util.readPageHeader(Util.java:127) at org.apache.parquet.hadoop.Offsets.readDictionaryPageSize(Offsets.java:75) at org.apache.parquet.hadoop.Offsets.getOffsets(Offsets.java:58) at org.apache.parquet.hadoop.ParquetFileWriter.appendRowGroup(ParquetFileWriter.java:998) at org.apache.parquet.hadoop.ParquetFileWriter.appendRowGroups(ParquetFileWriter.java:918) at org.apache.parquet.hadoop.ParquetFileReader.appendTo(ParquetFileReader.java:888) at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:895) at [...] Caused by: shaded.parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@b91d8c4 at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1108) at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1019) at org.apache.parquet.format.PageHeader.read(PageHeader.java:896) at org.apache.parquet.format.Util.read(Util.java:362) ... 14 more