Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently AvroSchemaConverter#convert does not support Parquet-to-Avro conversions where the Parquet schema contains a non-grouped repeated type. For example, this operation:
new AvroSchemaConverter()
.convert(MessageTypeParser.parseMessageType(
"message MySchema { repeated int32 repeatedField; }"
))
triggers an UnsupportedOperationException("REPEATED not supported outside LIST or MAP"): https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L292
However, if I'm interpreting the format spec correctly (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types), ungrouped repeated types should be treated as REQUIRED:
> This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field.
If this interpretation is correct, can we update AvroSchemaConverter to handle this use case? I'll put up a PR demonstrating it.
Attachments
Issue Links
- links to