Details
Description
Hi,
While migrating to the latest Pig version we have seen a general issue when using nested Avro records on Tez:
Caused by: java.io.IOException: class org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not implemented yet
at org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
...
The setup is
schema
{ "fields": [ { "name": "id", "type": "int" }, { "name": "property", "type": { "fields": [ { "name": "id", "type": "int" } ], "name": "Property", "type": "record" } } ], "name": "Person", "namespace": "com.github.ouyi.avro", "type": "record" }
Pig script group_person.pig
loaded_person = LOAD '$input' USING AvroStorage(); grouped_records = GROUP loaded_person BY (property.id); STORE grouped_records INTO '$output' USING AvroStorage();
sample data
{"id":1,"property":{"id":1}}
Execution on Tez
pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p output=file:///output group_person.pig ... Caused by: java.io.IOException: class org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not implemented yet at org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139) ...
Execution on mapred
pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p output=file:///output7 group_person.pig ... Output(s): Successfully stored 1 records in: "file:///output7" ...
I am going to attach the complete log files of both runs.
I assume that the Pig script should work regardless of Tez or mapreduce? Is there any underlying change when migrating to Tez which makes the schema invalid?
Thanks,
Sebastian