Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently, for map-only jobs, Pig writes map-outputs directly to HDFS and then sends zero data to reducers. The problem with this is two fold:
- Reduce slots are unnecessarily wasted on the cluster
- Reduces write empty files to HDFS putting pressure on the Namenode
Both these can we very easily avoided by just calling:
job.setNumReduces(0);
and letting Hadoop Map-Reduce take care of writing map-outputs directly to HDFS.
Attachments
Attachments
Issue Links
- is part of
-
PIG-157 Add types and rework execution pipeline
- Closed