Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.7, 1.8.1
-
None
-
None
-
Ubuntu
Description
I'm loading data from an avro file and attempting to create custom objects from it. If my schema specifies a class that doesn't exist, the library defaults to creating org.apache.avro.generic.GenericData$Record instances instead. The problem is that this behavior causes unexpected ClassCastException to be thrown when I try to access the datum field of MyCustomClassAvroKey (which is an AvroKey<MyCustomClass>). All the assignments such as this.mCurrentRecord = this.mAvroFileReader.next(this.mCurrentRecord) in org.apache.avro.mapreduce.AvroRecordReaderBase succeed because of the type erasure.
So my question is: why doesn't the library just fail when it can't load the requested class? E.g in org.apache.avro.specific.SpecificData#getClass(Schema schema):
try { c = ClassUtils.forName(getClassLoader(), getClassName(schema)); } catch (ClassNotFoundException e) { c = NO_CLASS; // why not just fail? }
Is this a design choice? It violates the type-safety guarantee and causes very confusing and unexpected behavior.
You can find a test-app that reproduces this problem in this GitHub repository: https://github.com/homosepian/spark-avro-kryo
I ran into it while trying to load custom types into a Spark RDD.