[MAPREDUCE-1853] MultipleOutputs does not cache TaskAttemptContext - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.21.0, 0.22.0
Fix Version/s: 0.21.0, 0.22.0
Component/s: task
Labels:
None
Environment:

OSX 10.6
java6

Hadoop Flags:

Reviewed

Description

In MultipleOutputs there is

 private TaskAttemptContext getContext(String nameOutput) throws IOException {
    // The following trick leverages the instantiation of a record writer via
    // the job thus supporting arbitrary output formats.
    Job job = new Job(context.getConfiguration());
    job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput));
    job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput));
    job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput));
    TaskAttemptContext taskContext = 
      new TaskAttemptContextImpl(job.getConfiguration(), 
                                 context.getTaskAttemptID());
    return taskContext;
  }

so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner.
That does not sound like a good idea.

You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized"

This should probably also be added to 0.22.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

cache-task-attempts.diff
13/Jun/10 10:09
2 kB
Torsten Curdt

Issue Links

is related to

MAPREDUCE-2740 MultipleOutputs in new API creates needless TaskAttemptContexts

Closed

Activity

People

Assignee:: Torsten Curdt

Reporter:: Torsten Curdt

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Jun/10 14:35

Updated:: 23/Nov/11 06:03

Resolved:: 16/Jun/10 11:25