Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.0, 1.6.0, 1.7.0, 1.8.0
-
None
Description
The problematic Avro and Thrift schemas are:
record AvroArrayOfArray { array<array<int>> int_arrays_column; }
and
struct ThriftListOfList { 1: list<list<i32>> intArraysColumn; }
They are converted to the following structurally equivalent Parquet schemas by parquet-avro 1.7.0 and parquet-thrift 1.7.0 respectively:
message AvroArrayOfArray { required group int_arrays_column (LIST) { repeated group array (LIST) { repeated int32 array; } } }
and
message ParquetSchema { required group intListsColumn (LIST) { repeated group intListsColumn_tuple (LIST) { repeated int32 intListsColumn_tuple_tuple; } } }
AvroIndexedRecordConverter cannot decode such records correctly. The reason is that the 2nd level repeated group array doesn't pass AvroIndexedRecordConverter.isElementType() check. We should check for field name "array" and field name suffix "_thrift" in isElementType() to fix this issue.
Attachments
Attachments
Issue Links
- blocks
-
PARQUET-212 Implement nested type read rules in parquet-thrift
- Resolved
- relates to
-
SPARK-10136 Parquet support fail to decode Avro/Thrift arrays of primitive array (e.g. array<array<int>>)
- Resolved
- links to