Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6513

MR job got hanged forever when one NM unstable for some time

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      when job is in-progress which is having more tasks,one node became unstable due to some OS issue.After the node became unstable, the map on this node status changed to KILLED state.

      Currently maps which were running on unstable node are rescheduled, and all are in scheduled state and wait for RM assign container.Seen ask requests for map till Node is good (all those failed), there are no ask request after this. But AM keeps on preempting the reducers (it's recycling).

      Finally reducers are waiting for complete mappers and mappers did n't get container..

      My Question Is:
      ============
      why map requests did not sent AM ,once after node recovery.?

      Attachments

        1. MAPREDUCE-6513.01.patch
          37 kB
          Varun Saxena
        2. MAPREDUCE-6513.02.patch
          34 kB
          Varun Saxena
        3. MAPREDUCE-6513.03.patch
          34 kB
          Varun Saxena
        4. MAPREDUCE-6513.3_1.branch-2.7.patch
          38 kB
          Wangda Tan
        5. MAPREDUCE-6513.3_1.branch-2.8.patch
          34 kB
          Wangda Tan
        6. MAPREDUCE-6513.3.branch-2.8.patch
          34 kB
          Wangda Tan

        Issue Links

          Activity

            People

              varun_saxena Varun Saxena
              Jobo Bob.zhao
              Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: