Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The ColumnChunkPageWriteStore casts both the compressed size and uncompressed size of a page from a long to an int. If the uncompressed size of a page exceeds Integer.MAX_VALUE, the write doesn't fail, although it creates bad metadata:
chunk1: BINARY GZIP DO:0 FPO:4 SZ:267184096/-2143335445/-8.02 VC:41 ENC:BIT_PACKED,PLAIN
At read time, the BytesInput will try to allocate a byte array for the uncompressed data and fails:
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://nameservice1/OUTPUT/part-m-00000.gz.parquet
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more
Caused by: java.lang.NegativeArraySizeException
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:183)
at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:521)
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:544)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:59)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:73)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
... 21 more