Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-465

Parquet-Avro does not support field removal

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Component/s: parquet-avro
    • Labels:
      None

      Description

      Parquet avro does not support removal of fields, when used with the new compatibility layer:

      Given a parquet file written with parquet avro at v1 and the following schema:

      record FooBar {
        long foo;
        string bar;
      }
      

      And the following configuration settings:

      job.getConfiguration.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, false)
      AvroParquetInputFormat.setAvroReadSchema(job, avroReaderSchema)
      

      A job fails when trying to read it using schema version v2:

      record FooBar {
        string bar;
      }
      

      With the error:

      org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'foo' not found
      	at org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:159)
      

      It looks like because it sees the field in the original version it assumes the new version must expect it, but this case just means that the field was removed. Avro schema resolution dictates that you just ignore this field, since it is not relevant in the new version.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              eggsby Thomas Omans
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: