Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-196

Pig should use '-reducer NONE' for map-only jobs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.2.0
    • None
    • None

    Description

      Currently, for map-only jobs, Pig writes map-outputs directly to HDFS and then sends zero data to reducers. The problem with this is two fold:

      • Reduce slots are unnecessarily wasted on the cluster
      • Reduces write empty files to HDFS putting pressure on the Namenode

      Both these can we very easily avoided by just calling:

      job.setNumReduces(0);
      

      and letting Hadoop Map-Reduce take care of writing map-outputs directly to HDFS.

      Attachments

        1. PIG-196_0_20080412.patch
          2 kB
          Arun Murthy
        2. PIG-196_1_20080422.patch
          3 kB
          Arun Murthy

        Issue Links

          Activity

            People

              acmurthy Arun Murthy
              acmurthy Arun Murthy
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: