[MAPREDUCE-150] speculative reduce should touch output files only through OutputFormat - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

~~HADOOP-1127~~ introduced speculative reduce. This was implemented by having the MapReduce kernel directly manipulate a job's output files. This is inconsistent with the architecture of the InputFormat and OutputFormat interfaces. The kernel should never directly operate on job input or output, always instead deferring to these interfaces.

To correct this, we will need to add some new methods to OutputFormat, something like:

/** rename output generated by getRecordWriter(job, tempName) */
void completeOutput(JobConf job, String tempName, String finalName);

/** cleanup output generated by getRecordWriter(job, tempName). called for unused outputs. */
void cleanupOutput(JobConf job, String tempName);

These should be implemented in OutputFormatBase, which should be renamed FileOutputFormat.

To prevent this happening again, we should also move JobConf#getInputPath(), #setInputPath(), #getOutputPath(), and #setOutputPath() to static methods on FileInputFormat and FileOutputFormat, since these methods are specific to jobs with file inputs and outputs (and not, e.g., HBase tables).

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Doug Cutting

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/May/07 18:47

Updated:: 20/Jun/09 07:50