Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2425

AvroSchemaConverter doesn't support non-grouped repeated fields

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14.0
    • None
    • None

    Description

      Currently AvroSchemaConverter#convert does not support Parquet-to-Avro conversions where the Parquet schema contains a non-grouped repeated type. For example, this operation:
       

      new AvroSchemaConverter()

         .convert(MessageTypeParser.parseMessageType(

           "message MySchema { repeated int32 repeatedField; }"

         ))
       

      triggers an UnsupportedOperationException("REPEATED not supported outside LIST or MAP"): https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L292
       

      However, if I'm interpreting the format spec correctly (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types), ungrouped repeated types should be treated as REQUIRED:

      > This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field.

      If this interpretation is correct, can we update AvroSchemaConverter to handle this use case? I'll put up a PR demonstrating it.

      Attachments

        Issue Links

          Activity

            People

              clairemcginty Claire McGinty
              clairemcginty Claire McGinty
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: