Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1356

AvroMultipleOutputs map only jobs do not use NamedOutput schemas

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.4
    • 1.7.5
    • java
    • None

    Description

      AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:

      boolean isMaponly = job.getNumReduceTasks() == 0;
          if (keySchema != null) {
            if (isMaponly)
              AvroJob.setMapOutputKeySchema(job, keySchema);
            else
              AvroJob.setOutputKeySchema(job, keySchema);
          }
          if (valSchema != null) {
            if (isMaponly)
              AvroJob.setMapOutputValueSchema(job, valSchema);
            else
              AvroJob.setOutputValueSchema(job, valSchema);
          }
      

      Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.

      We can fix this by either

      • Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
      • Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema

      Attachments

        1. AVRO-1356.patch
          5 kB
          Alan Paulsen

        Activity

          People

            apaulsen Alan Paulsen
            apaulsen Alan Paulsen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: