Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6166

Reducers do not validate checksum of map outputs when fetching directly to disk

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      In very large map/reduce jobs (50000 maps, 2500 reducers), the intermediate map partition output gets corrupted on disk on the map side. If this corrupted map output is too large to shuffle in memory, the reducer streams it to disk without validating the checksum. In jobs this large, it could take hours before the reducer finally tries to read the corrupted file and fails. Since retries of the failed reduce attempt will also take hours, this delay in discovering the failure is multiplied greatly.

        Attachments

        1. MAPREDUCE-6166.v1.201411221941.txt
          6 kB
          Eric Payne
        2. MAPREDUCE-6166.v2.201411251627.txt
          6 kB
          Eric Payne
        3. MAPREDUCE-6166.v3.txt
          6 kB
          Eric Payne
        4. MAPREDUCE-6166.v4.txt
          7 kB
          Eric Payne
        5. MAPREDUCE-6166.v5.txt
          7 kB
          Eric Payne

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment