Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1853

MultipleOutputs does not cache TaskAttemptContext

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0, 0.22.0
    • Component/s: task
    • Labels:
      None
    • Environment:

      OSX 10.6
      java6

    • Hadoop Flags:
      Reviewed

      Description

      In MultipleOutputs there is

       private TaskAttemptContext getContext(String nameOutput) throws IOException {
          // The following trick leverages the instantiation of a record writer via
          // the job thus supporting arbitrary output formats.
          Job job = new Job(context.getConfiguration());
          job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput));
          job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput));
          job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput));
          TaskAttemptContext taskContext = 
            new TaskAttemptContextImpl(job.getConfiguration(), 
                                       context.getTaskAttemptID());
          return taskContext;
        }
      

      so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner.
      That does not sound like a good idea.

      You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized"

      This should probably also be added to 0.22.

        Issue Links

          Activity

            People

            • Assignee:
              Torsten Curdt
              Reporter:
              Torsten Curdt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development