Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.10.1
-
None
-
None
Description
Hi,
I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group id {} while I include a UUID field on my POJO object. Without UUID everything worked fine. I have seen Parquet suports UUID as part of PR-71 on 2.4 release.
But I am getting InvalidSchemaException on UUID. Is there anything that I am missing or its a known issue?
My setup details:
gradle dependency :
dependencies
{ compile group: 'org.springframework.boot', name: 'spring-boot-starter' compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271' compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1' compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' compile group: 'joda-time', name: 'joda-time' compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5' compile group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-joda', version: '2.6.5' }Model used:
@Data
public class Employee
@Data
public class Address
@Data
public class Zip
My Serializer Code:
public void serialize(List<D> inputDataToSerialize, CompressionCodecName compressionCodecName) throws IOException {
Path path = new Path("s3a://parquetpoc/data_"+compressionCodecName+".parquet");
Class clazz = inputDataToSerialize.get(0).getClass();
try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
.withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable fields
.withDataModel(ReflectData.get())
.withConf(parquetConfiguration)
.withCompressionCodec(compressionCodecName)
.withWriteMode(OVERWRITE)
.withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
.build()) {
for (D input : inputDataToSerialize)
{ writer.write(input); }}
}
private List<Employee> getInputDataToSerialize(){
Address address = new Address();
address.setStreetName("Murry Ridge Dr");
address.setCity("Murrysville");
Zip zip = new Zip();
zip.setZip(15668);
zip.setExt(1234);
address.setZip(zip);
List<Employee> employees = new ArrayList<>();
IntStream.range(0, 100000).forEach(i->
{ Employee employee = new Employee(); // employee.setId(UUID.randomUUID()); employee.setAge(20); employee.setName("Test"+i); employee.setAddress(address); employees.add(employee); });
return employees;
}
**Where generic Type D is Employee