Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2919

Create fewer copies of buffer data during sort/spill

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.17.0
    • None
    • None

    Description

      Currently, the sort/spill works as follows:

      Let r be the number of partitions
      For each call to collect(K,V) from map:

      • If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
      • Write K,V into buffer, noting offsets
      • Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
      • Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
      • If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress

      For each spill (assuming no combiner):

      • Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
      • Open a SequenceFile.Writer for this partition
      • Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
      • Build a RawKeyValueIterator of sorted data for the partition
      • Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition

      There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.

      Attachments

        1. 2919-0.patch
          53 kB
          Christopher Douglas
        2. 2919-1.patch
          57 kB
          Christopher Douglas
        3. 2919-2.patch
          60 kB
          Christopher Douglas
        4. 2919-3.patch
          60 kB
          Christopher Douglas
        5. 2919-4.patch
          60 kB
          Christopher Douglas
        6. 2919-5.patch
          61 kB
          Christopher Douglas
        7. 2919-6.patch
          62 kB
          Christopher Douglas
        8. 2919-7.patch
          62 kB
          Christopher Douglas

        Issue Links

          Activity

            People

              cdouglas Christopher Douglas
              cdouglas Christopher Douglas
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: