Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6689

MapReduce job can infinitely increase number of reducer resource requests

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None

      Description

      We have seen this issue from one of our clusters: when running terasort map-reduce job, some mappers failed after reducer started, and then MR AM tries to preempt reducers to schedule these failed mappers.

      After that, MR AM enters an infinite loop, for every RMContainerAllocator#heartbeat run, it:

      • In preemptReducesIfNeeded, it cancels all scheduled reducer requests. (total scheduled reducers = 1024)
      • Then, in scheduleReduces, it ramps up all reducers (total = 1024).

      As a result, we can see total #requested-containers increased 1024 for every MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.

      And this bug also triggered YARN-4844, which makes RM stop scheduling anything.

      Thanks to Sidharta Seethana for helping with analysis.

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment