Issue Details (XML | Word | Printable)

Key: HADOOP-1609
Type: Improvement Improvement
Status: Closed Closed
Resolution: Duplicate
Priority: Major Major
Assignee: Unassigned
Reporter: Espen Amble Kolstad
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Optimize MapTask.MapOutputBuffer.spill() by not deserialize/serialize keys/values but use appendRaw

Created: 13/Jul/07 11:41 AM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: 0.14.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works spill.patch 2007-07-13 01:05 PM Espen Amble Kolstad 6 kB
Text File Licensed for inclusion in ASF works spill.patch 2007-07-13 11:41 AM Espen Amble Kolstad 6 kB
Issue Links:
Incorporates
 

Resolution Date: 20/Jun/08 01:10 AM


 Description  « Hide
In MapTask.MapOutputBuffer.spill() every key and value is read from buffer and then written to file with append(key, value):
DataInputBuffer keyIn = new DataInputBuffer();
      DataInputBuffer valIn = new DataInputBuffer();
      DataOutputBuffer valOut = new DataOutputBuffer();
      while (resultIter.next()) {
        keyIn.reset(resultIter.getKey().getData(), 
                    resultIter.getKey().getLength());
        key.readFields(keyIn);
        valOut.reset();
        (resultIter.getValue()).writeUncompressedBytes(valOut);
        valIn.reset(valOut.getData(), valOut.getLength());
        value.readFields(valIn);
        writer.append(key, value);
        reporter.progress();
      }

When you have complex objects, like nutch's ParseData or Inlinks, this takes time and creates lots of garbage.

I've created a patch, it seems to be working, only tested on 0.13.0.
It's a bit clumsy, since ValueBytes is cast to Un-/CompressedBytes in SequenceFile.Writer.

Thoughts?



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
There are no subversion log entries for this issue yet.