[PIG-196] Pig should use '-reducer NONE' for map-only jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.2.0
Component/s: None
Labels:
None

Description

Currently, for map-only jobs, Pig writes map-outputs directly to HDFS and then sends zero data to reducers. The problem with this is two fold:

Reduce slots are unnecessarily wasted on the cluster
Reduces write empty files to HDFS putting pressure on the Namenode

Both these can we very easily avoided by just calling:

job.setNumReduces(0);

and letting Hadoop Map-Reduce take care of writing map-outputs directly to HDFS.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-196_0_20080412.patch
12/Apr/08 23:47
2 kB
Arun Murthy
PIG-196_1_20080422.patch
22/Apr/08 21:40
3 kB
Arun Murthy

Issue Links

is part of

PIG-157 Add types and rework execution pipeline

Closed

Activity

People

Assignee:: Arun Murthy

Reporter:: Arun Murthy

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Apr/08 16:53

Updated:: 25/Mar/10 00:12

Resolved:: 30/Sep/08 23:07