Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1679

Invalid SchemaException for UUID while using AvroParquetWriter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10.1
    • None
    • parquet-avro
    • None

    Description

      Hi,

      I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group id {} while I include a UUID field on my POJO object. Without UUID everything worked fine. I have seen Parquet suports UUID as part of PR-71 on 2.4 release.
      But I am getting InvalidSchemaException on UUID. Is there anything that I am missing or its a known issue?

      My setup details:

      gradle dependency :

      dependencies

      { compile group: 'org.springframework.boot', name: 'spring-boot-starter' compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271' compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1' compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' compile group: 'joda-time', name: 'joda-time' compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5' compile group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-joda', version: '2.6.5' }

      Model used:

      @Data
      public class Employee

      { private UUID id; private String name; private int age; private Address address; }

      @Data
      public class Address

      { private String streetName; private String city; private Zip zip; }

      @Data
      public class Zip

      { private int zip; private int ext; }

       

      My Serializer Code:

      public void serialize(List<D> inputDataToSerialize, CompressionCodecName compressionCodecName) throws IOException {

      Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
      Class clazz = inputDataToSerialize.get(0).getClass();

      try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
      .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable fields
      .withDataModel(ReflectData.get())
      .withConf(parquetConfiguration)
      .withCompressionCodec(compressionCodecName)
      .withWriteMode(OVERWRITE)
      .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
      .build()) {

      for (D input : inputDataToSerialize)

      { writer.write(input); }

      }
      }

      private List<Employee> getInputDataToSerialize(){
      Address address = new Address();
      address.setStreetName("Murry Ridge Dr");
      address.setCity("Murrysville");
      Zip zip = new Zip();
      zip.setZip(15668);
      zip.setExt(1234);

      address.setZip(zip);

      List<Employee> employees = new ArrayList<>();

      IntStream.range(0, 100000).forEach(i->

      { Employee employee = new Employee(); // employee.setId(UUID.randomUUID()); employee.setAge(20); employee.setName("Test"+i); employee.setAddress(address); employees.add(employee); }

      );
      return employees;
      }

      **Where generic Type D is Employee

      Attachments

        Activity

          People

            Unassigned Unassigned
            FelixKJose Felix Kizhakkel Jose
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: