Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-534

AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • java
    • ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro (trunk – 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA)

    • Reviewed
    • Avro, MapReduce, AvroRecordReader

    Description

      Consider an Avro File of a single record type with about 70 fields in the order (str, str, str, long, str, double, [lets take only first 6 into consideration] ...).
      To pass this into a simple MapReduce job I do: AvroInputFormat.addInputPath(...) and it works well with an IdentityMapper.

      Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the special schema with my 3 fields as (str (0), str (1), long(2)) using AvroJob.setInputGeneric(..., mySchema). This leads to a failure of the mapreduce job since the Avro record reader reads the file for its entire schema (of 70 fields) and tries to convert my given 'long' field to 'str' as is at the index 2 of the actual schema (meaning its using the actual schema embedded into the file, not what I supplied!).

      The AvroRecordReader must support reading in the schema specified by the user using AvroJob.setInputGeneric.

      I've written a patch for it to do the same but am not sure if its actually the solution (MAP_OUTPUT_SCHEMA use?)

      Attachments

        Activity

          People

            qwertymaniac Harsh J
            qwertymaniac Harsh J
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: