Issue Details (XML | Word | Printable)

Key: HADOOP-2919
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Chris Douglas
Reporter: Chris Douglas
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Create fewer copies of buffer data during sort/spill

Created: 01/Mar/08 02:02 AM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 2919-0.patch 2008-03-01 02:03 AM Chris Douglas 53 kB
Text File Licensed for inclusion in ASF works 2919-1.patch 2008-03-07 01:51 AM Chris Douglas 57 kB
Text File Licensed for inclusion in ASF works 2919-2.patch 2008-03-10 07:01 AM Chris Douglas 60 kB
Text File Licensed for inclusion in ASF works 2919-3.patch 2008-03-10 05:58 PM Chris Douglas 60 kB
Text File Licensed for inclusion in ASF works 2919-4.patch 2008-03-13 10:28 PM Chris Douglas 60 kB
Text File Licensed for inclusion in ASF works 2919-5.patch 2008-03-19 03:00 AM Chris Douglas 61 kB
Text File Licensed for inclusion in ASF works 2919-6.patch 2008-03-20 01:23 AM Chris Douglas 62 kB
Text File Licensed for inclusion in ASF works 2919-7.patch 2008-03-26 10:27 PM Chris Douglas 62 kB
Issue Links:
Blocker
 
Incorporates
 

Resolution Date: 31/Mar/08 10:51 PM


 Description  « Hide
Currently, the sort/spill works as follows:

Let r be the number of partitions
For each call to collect(K,V) from map:

  • If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
  • Write K,V into buffer, noting offsets
  • Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
  • Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
  • If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress

For each spill (assuming no combiner):

  • Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
  • Open a SequenceFile.Writer for this partition
  • Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
  • Build a RawKeyValueIterator of sorted data for the partition
  • Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition

There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Chris Douglas made changes - 01/Mar/08 02:03 AM
Field Original Value New Value
Attachment 2919-0.patch [ 12376888 ]
Chris Douglas made changes - 01/Mar/08 03:04 AM
Assignee Chris Douglas [ chris.douglas ]
Chris Douglas made changes - 04/Mar/08 09:45 PM
Link This issue incorporates HADOOP-2054 [ HADOOP-2054 ]
Chris Douglas made changes - 04/Mar/08 09:45 PM
Link This issue incorporates HADOOP-872 [ HADOOP-872 ]
Chris Douglas made changes - 04/Mar/08 09:45 PM
Link This issue incorporates HADOOP-287 [ HADOOP-287 ]
Chris Douglas made changes - 06/Mar/08 10:11 PM
Link This issue is blocked by HADOOP-2943 [ HADOOP-2943 ]
Chris Douglas made changes - 07/Mar/08 01:51 AM
Attachment 2919-1.patch [ 12377308 ]
Chris Douglas made changes - 10/Mar/08 07:01 AM
Attachment 2919-2.patch [ 12377508 ]
Chris Douglas made changes - 10/Mar/08 07:02 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Fix Version/s 0.17.0 [ 12312913 ]
Chris Douglas made changes - 10/Mar/08 05:58 PM
Attachment 2919-3.patch [ 12377540 ]
Chris Douglas made changes - 10/Mar/08 05:59 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 10/Mar/08 05:59 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Devaraj Das made changes - 13/Mar/08 09:44 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Devaraj Das made changes - 13/Mar/08 09:45 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Chris Douglas made changes - 13/Mar/08 10:28 PM
Attachment 2919-4.patch [ 12377841 ]
Chris Douglas made changes - 13/Mar/08 10:28 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 13/Mar/08 10:28 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Chris Douglas made changes - 19/Mar/08 03:00 AM
Attachment 2919-5.patch [ 12378195 ]
Chris Douglas made changes - 19/Mar/08 07:06 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 19/Mar/08 07:06 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Chris Douglas made changes - 20/Mar/08 01:21 AM
Attachment 2919-6.patch [ 12378285 ]
Chris Douglas made changes - 20/Mar/08 01:21 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 20/Mar/08 01:21 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Chris Douglas made changes - 20/Mar/08 01:23 AM
Attachment 2919-6.patch [ 12378286 ]
Chris Douglas made changes - 20/Mar/08 01:23 AM
Attachment 2919-6.patch [ 12378285 ]
Chris Douglas made changes - 20/Mar/08 01:23 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 20/Mar/08 01:23 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Nigel Daley made changes - 24/Mar/08 11:04 PM
Priority Major [ 3 ] Blocker [ 1 ]
Chris Douglas made changes - 26/Mar/08 10:27 PM
Attachment 2919-7.patch [ 12378667 ]
Chris Douglas made changes - 27/Mar/08 04:52 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Chris Douglas made changes - 27/Mar/08 04:52 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Owen O'Malley made changes - 31/Mar/08 10:51 PM
Resolution Fixed [ 1 ]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Nigel Daley made changes - 21/May/08 08:05 PM
Status Resolved [ 5 ] Closed [ 6 ]
Chris Douglas made changes - 20/Jun/08 01:10 AM
Link This issue incorporates HADOOP-1609 [ HADOOP-1609 ]
Owen O'Malley made changes - 08/Jul/09 04:52 PM
Component/s mapred [ 12310690 ]