Hadoop Common
  1. Hadoop Common
  2. HADOOP-5539

o.a.h.mapred.Merger not maintaining map out compression on intermediate files

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.19.1
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      0.19.2-dev, r753365

    • Hadoop Flags:
      Reviewed

      Description

      hadoop-site.xml :
      mapred.compress.map.output = true

      map output files are compressed but when the in memory merger closes
      on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed.

      when this happens it outputs files called intermediate.x files these
      do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
      passes the codec but I added some logging and its always null map output compression set true or false.

      This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
      I thank this is just and oversight of the codec not getting set correctly for the on disk merges.

      2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
      2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
      

      I added

                // added my me
      	   if (codec != null){
      	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
      	   } else {
      	     LOG.info("intermediate." + passNo + " used codec: Null");
      	   }
      	   // end added by me
      

      Just before the creation of the writer o.a.h.mapred.Merger.class line 432
      and it outputs the second line above.

      I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in
      the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
      telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.

      I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

      1. hadoop-5539-branch20.patch
        4 kB
        Jothi Padmanabhan
      2. hadoop-5539-v1.patch
        5 kB
        Jothi Padmanabhan
      3. hadoop-5539.patch
        6 kB
        Jothi Padmanabhan
      4. 5539.patch
        5 kB
        Billy Pearson

        Activity

        Billy Pearson created issue -
        Billy Pearson made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.20.0 [ 12313438 ]
        Billy Pearson made changes -
        Attachment 5539.patch [ 12403378 ]
        Billy Pearson made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Billy Pearson made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Chris Douglas made changes -
        Assignee Billy Pearson [ viper799 ]
        Fix Version/s 0.19.2 [ 12313650 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Billy Pearson made changes -
        Fix Version/s 0.19.2 [ 12313650 ]
        Fix Version/s 0.20.0 [ 12313438 ]
        Billy Pearson made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Chris Douglas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.20.0 [ 12313438 ]
        Fix Version/s 0.19.2 [ 12313650 ]
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Fix Version/s 0.19.2 [ 12313650 ]
        Billy Pearson made changes -
        Assignee Billy Pearson [ viper799 ]
        Jothi Padmanabhan made changes -
        Attachment hadoop-5539.patch [ 12408232 ]
        Jothi Padmanabhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jothi Padmanabhan made changes -
        Attachment hadoop-5539-v1.patch [ 12408652 ]
        Jothi Padmanabhan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Jothi Padmanabhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jothi Padmanabhan made changes -
        Attachment hadoop-5539-branch20.patch [ 12409146 ]
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Assignee Jothi Padmanabhan [ jothipn ]
        Fix Version/s 0.20.1 [ 12313866 ]
        Fix Version/s 0.19.2 [ 12313650 ]
        Resolution Fixed [ 1 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]

          People

          • Assignee:
            Jothi Padmanabhan
            Reporter:
            Billy Pearson
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development