Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Because of this, we're unable to fully leverage the ThreeLevelListWriter functionality when trying to write Avro lists out using Parquet through the AvroParquetOutputFormat.
The following record is used for testing:
Schema:
{ "type": "record", "name": "NullLists", "namespace": "com.test", "fields": [ \{ "name": "KeyID", "type": "string" }, { "name": "NullableList", "type": [ "null",
{ "type": "array", "items": [ "null", "string" ] }], "default": null } ] }
Record (using basic JSON just for display purposes):
{ "KeyID": "0", "NullableList": [ "foo", null, "baz" ] }During testing, we see the following exception:
Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group
{{ at org.apache.parquet.schema.Type.asGroupType(Type.java:250)}}
{{ at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)}}
{{ at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)}}
{{ at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)}}
{{ at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)}}
{{ at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)}}
{{ at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)}}
{{ at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128}}
Upon review, it was found that the configuration option that was set in AvroWriteSupport for the ThreeLevelListWriter, parquet.avro.write-old-list-structure being set to false, was never shared with the AvroSchemaConverter.
Once we made this change and tested locally, we observe the record with nulls in the array being successfully written by AvroParquetOutputFormat.
Attachments
Issue Links
- is depended upon by
-
PARQUET-2145 Release 1.12.3
- Resolved