Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-923

Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5.4
    • None
    • java
    • None
    • any

    Description

      The current implementation of Avro MapRed is designed to use JobConf. While it is possible to use job.xml file, it is pretty painful since you have to copy/paste the all schemes for input and output. This is error prone and time consuming. Also any update in a bean requires to recopy/repaste the schema (if using JobConf a simple recompile would be enough).

      A proposition to improve this and to stay backward compatible would be to introduce new keys in AvroJob and reference the actual avro bean used. This can be implemented as a fallback.

      New keys would be created:

      • avro.input.schema > avro.input.class
      • avro.map.output.schema > avro.map.output.class
      • avro.output.schema > avro.output.class

      Only 3 methods would be impacted in AvroJob:

      • getInputSchema(Configuration job) { // Implement a fallback like String s = job.get(INPUT_SCHEMA); if(s==null) s = (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null); return Schema.parse(s); }

        }

      • getMapOutputSchema()
      • getOutputSchema()

      Also, it would be more consistent to add new setters. This is not mandatory since in that use case, the new keys are filled up directly in the job, not using AvroJob.

      Attachments

        Activity

          People

            Unassigned Unassigned
            julien.muller Julien Muller
            Votes:
            3 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified