Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The AvroParquetInputFormat currently extends ParquetInputFormat<IndexedRecord>, which works for regular MR cases. But Spark's hadoopRDD and newAPIHadoopRDD methods (correctly) create a RDD with the types from the InputFormat. This means that the RDD always uses IndexedRecord rather than the correct type.
The AvroParquetInputFormat should be AvroParquetInputFormat<T extends IndexedRecord> extends ParquetInputFormat<T>