Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1356

AvroMultipleOutputs map only jobs do not use NamedOutput schemas

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.4
    • Fix Version/s: 1.7.5
    • Component/s: java
    • Labels:
      None

      Description

      AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:

      boolean isMaponly = job.getNumReduceTasks() == 0;
          if (keySchema != null) {
            if (isMaponly)
              AvroJob.setMapOutputKeySchema(job, keySchema);
            else
              AvroJob.setOutputKeySchema(job, keySchema);
          }
          if (valSchema != null) {
            if (isMaponly)
              AvroJob.setMapOutputValueSchema(job, valSchema);
            else
              AvroJob.setOutputValueSchema(job, valSchema);
          }
      

      Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.

      We can fix this by either

      • Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
      • Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema

        Attachments

          Activity

            People

            • Assignee:
              apaulsen Alan Paulsen
              Reporter:
              apaulsen Alan Paulsen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: