Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1059

distcp can generate uneven map task assignments

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • distcp
    • None
    • Reviewed

    Description

      distcp writes out a SequenceFile containing the source files to transfer, and their sizes. Map tasks are created over spans of this file, representing files which each mapper should transfer. In practice, some transfer loads yield many empty map tasks and a few tasks perform the bulk of the work.

      Attachments

        1. MAPREDUCE-1059.patch
          16 kB
          Aaron Kimball
        2. MAPREDUCE-1059.2.patch
          18 kB
          Aaron Kimball
        3. MAPREDUCE-1059.3.patch
          1 kB
          Aaron Kimball

        Activity

          People

            kimballa Aaron Kimball
            kimballa Aaron Kimball
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: