Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.6.1
-
None
-
None
Description
I realized that ReflectDatumWriter is always used when running mapred job (in AvroOutputFormat.java). Sometimes it leads to bugs like in AVRO-966.
Why not just provide a property like WRITER_IS_REFLECT = "avro.map.writer.is.reflect"; to make a decision which DatumWriter should be used.
I created a small patch to solve this:
avro-mapred.patch
Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java =================================================================== --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java (revision ) @@ -53,6 +53,8 @@ /** The configuration key for reflection-based map output representation. */ public static final String MAP_OUTPUT_IS_REFLECT = "avro.map.output.is.reflect"; + public static final String WRITER_IS_REFLECT = "avro.map.writer.is.reflect"; + /** Configure a job's map input schema. */ public static void setInputSchema(JobConf job, Schema s) { job.set(INPUT_SCHEMA, s.toString()); Index: lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java =================================================================== --- lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision 1209417) +++ lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroOutputFormat.java (revision ) @@ -23,6 +23,7 @@ import java.util.Map; import java.net.URLDecoder; +import org.apache.avro.specific.SpecificDatumWriter; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; @@ -102,8 +103,9 @@ ? AvroJob.getMapOutputSchema(job) : AvroJob.getOutputSchema(job); - final DataFileWriter<T> writer = - new DataFileWriter<T>(new ReflectDatumWriter<T>()); + final DataFileWriter<T> writer = job.getBoolean(AvroJob.WRITER_IS_REFLECT, false) ? + new DataFileWriter<T>(new ReflectDatumWriter<T>()) : + new DataFileWriter<T>(new SpecificDatumWriter<T>()); configureDataFileWriter(writer, job);
Does it make sense?