Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2051

AvroWriteSupport does not pass Configuration to AvroSchemaConverter on Creation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.3
    • None
    • None

    Description

      Because of this, we're unable to fully leverage the ThreeLevelListWriter functionality when trying to write Avro lists out using Parquet through the AvroParquetOutputFormat.

      The following record is used for testing:

       Schema:

      { "type": "record", "name": "NullLists", "namespace": "com.test", "fields": [ \{ "name": "KeyID", "type": "string" }

      , { "name": "NullableList", "type": [ "null",

      { "type": "array", "items": [ "null", "string" ] }

      ], "default": null } ] }

      Record (using basic JSON just for display purposes):

      { "KeyID": "0", "NullableList": [ "foo", null, "baz" ] }

      During testing, we see the following exception:

      Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group
      {{ at org.apache.parquet.schema.Type.asGroupType(Type.java:250)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)}}
      {{ at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)}}
      {{ at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128}}

      Upon review, it was found that the configuration option that was set in AvroWriteSupport for the ThreeLevelListWriter, parquet.avro.write-old-list-structure being set to false, was never shared with the AvroSchemaConverter.

      Once we made this change and tested locally, we observe the record with nulls in the array being successfully written by AvroParquetOutputFormat. 

      Attachments

        Issue Links

          Activity

            People

              ahailu Andreas Hailu
              ahailu Andreas Hailu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: