Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6485

MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.4.1, 2.6.0, 2.7.1, 3.0.0-alpha1
    • 2.8.0, 3.0.0-alpha1
    • applicationmaster
    • None
    • Reviewed

    Description

      The scenarios is like this:
      With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will take resource and start to run when all the map have not finished.
      But It could happened that when all the resources are taken up by running reduces, there is still one map not finished.
      Under this condition , the last map have two task attempts .
      As for the first attempt was killed due to timeout(mapreduce.task.timeout), and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to FAILED, but failed map attempt would not be restarted for there is still one speculate map attempt in progressing.
      As for the second attempt which was started due to having enable map task speculative is pending at UNASSINGED state because of no resource available. But the second map attempt request have lower priority than reduces, so preemption would not happened.
      As a result all reduces would not finished because of there is one map left. and the last map hanged there because of no resource available. so, the job would never finish.

      Attachments

        1. MAPREDUCE-6485.001.patch
          3 kB
          Xianyin Xin
        2. MAPREDUCE-6485.004.patch
          10 kB
          Xianyin Xin
        3. MAPREDUCE-6485.005.patch
          10 kB
          Xianyin Xin
        4. MAPREDUCE-6485.006.patch
          10 kB
          Xianyin Xin
        5. MAPREDUCE-6845.002.patch
          11 kB
          Xianyin Xin
        6. MAPREDUCE-6845.003.patch
          11 kB
          Xianyin Xin

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xinxianyin Xianyin Xin
            Jobo Bob.zhao
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment