Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
1.7.6
-
None
-
None
Description
Consider the following code:
import java.io.ByteArrayOutputStream; import java.util.*; import org.apache.avro.Schema; import org.apache.avro.file.DataFileWriter; import org.apache.avro.reflect.ReflectData; import org.apache.avro.reflect.ReflectDatumWriter; public class AvroDerivingMaps { public static void main (String [] args) throws Exception { MapDerivedContainer orig = new MapDerivedContainer(); ReflectData rdata = ReflectData.AllowNull.get(); Schema schema = rdata.getSchema(MapDerivedContainer.class); System.out.println(schema); ReflectDatumWriter<MapDerivedContainer> datumWriter = new ReflectDatumWriter (MapDerivedContainer.class, rdata); DataFileWriter<MapDerivedContainer> fileWriter = new DataFileWriter<MapDerivedContainer> (datumWriter); ByteArrayOutputStream baos = new ByteArrayOutputStream(); fileWriter.create(schema, baos); fileWriter.append(orig); fileWriter.close(); } } class MapDerived extends HashMap<String, Integer> { Integer a = 1; String b = "b"; } class MapDerivedContainer { MapDerived2 map = new MapDerived2(); } class MapDerived2 extends MapDerived { String c = "c"; }
It throws the following exception:
{"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]}
Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException:
org.apache.avro.UnresolvedUnionException:
Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]: {}
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 1 more
It appears that ReflectData#createSchema() checks for "type instanceof ParameterizedType" and because of this, it skips handling of the map.
The same is not true of GenericData#isMap() and GenericData#resolveUnion() fails because of this.
The same may be true for classes extending ArrayList, Collection, Set etc.
Also, note the schema for the class extending Map:
{ "type":"record", "name":"MapDerived2", "fields":[ { "name":"c", "type":[ "null", "string" ], "default":null }, { "name":"a", "type":[ "null", "int" ], "default":null }, { "name":"b", "type":[ "null", "string" ], "default":null } ] }
This schema ignores the Map completely.
Probably, for such a class, the schema should look like:
{ "type":"record", "name":"MapDerived2", "fields":[ { "name":"c", "type":[ "null", "string" ], "default":null }, .... // Other fields in the class extending the Map { "name":"BASE_MAP", "type":[ "null", "map" ... // Normal map which the class extends (implements?) ], "default":null } }