Description
Having a schema fragment like this:
{ "name": "ownerId", "type": [ "null", { "type": "string", "java-class": "java.net.URI" } ], "default": null }
can be perfectly deserialized in a generated POJO with
@org.apache.avro.specific.AvroGenerated public class MyAvroDataObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... @Deprecated public java.net.URI ownerId;
as
GenericDatumReader.readString(Object, Schema, Decoder) uses via the stringClassCache with
{"type":"string","java-class":"java.net.URI"}=class java.net.URI
The URI class itself to rehydrate the value via newInstanceFromString.
On the other hand, deepCopy only considers the schema-type of the field and turns in org.apache.avro.generic.GenericData.deepCopy(Schema, T)
the URI value into an org.apache.avro.util.Utf8 via the String case which then causes a ClassCastException:
java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to java.net.URI at com.example.MyAvroDataObject.put(MyAvroDataObject.java:104) at org.apache.avro.generic.GenericData.setField(GenericData.java:660) at org.apache.avro.generic.GenericData.setField(GenericData.java:677) at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1082) at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1102) at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1080)
The following dirty hack seems to avoid the issue - but is not in sync with the stringClassCache which should be consulted, too:
case STRING: // Strings are immutable if (value instanceof String) { return (T)value; } // Dirty Harry 9 3/4 start // URIs are immutable and are probably modeled as an URI itself // TODO: Check with stringClassCache & the schema else if ((value instanceof URI) && URI.class.getName().equals(schema.getProp("java-class")) ) { return (T)value; } // Dirt Harry 9 3/4 end // Some CharSequence subclasses are mutable, so we still need to make // a copy else if (value instanceof Utf8) { // Utf8 copy constructor is more efficient than converting // to string and then back to Utf8 return (T)new Utf8((Utf8)value); } return (T)new Utf8(value.toString());
Also tried with Avro 1.10-SNAPSHOT of 2019-06-20 / 2d3b1fe7efd865639663ba785877182e7e038c45 due to https://github.com/apache/avro/pull/329 - but the issue remains.