Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1853

MultipleOutputs does not cache TaskAttemptContext

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.21.0, 0.22.0
    • 0.21.0, 0.22.0
    • task
    • None
    • OSX 10.6
      java6

    • Reviewed

    Description

      In MultipleOutputs there is

       private TaskAttemptContext getContext(String nameOutput) throws IOException {
          // The following trick leverages the instantiation of a record writer via
          // the job thus supporting arbitrary output formats.
          Job job = new Job(context.getConfiguration());
          job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput));
          job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput));
          job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput));
          TaskAttemptContext taskContext = 
            new TaskAttemptContextImpl(job.getConfiguration(), 
                                       context.getTaskAttemptID());
          return taskContext;
        }
      

      so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner.
      That does not sound like a good idea.

      You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized"

      This should probably also be added to 0.22.

      Attachments

        1. cache-task-attempts.diff
          2 kB
          Torsten Curdt

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tcurdt Torsten Curdt
            tcurdt Torsten Curdt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment