Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3228

MR AM hangs when one node goes bad

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.23.0
    • 0.23.0
    • applicationmaster, mrv2
    • None

    Description

      Found this on one of the gridmix runs, again. One of the nodes went real bad, the job had three containers running on the node. Eventually, AM marked the tasks as timedout and initiated cleanup of the failed containers via stopContainer(). The later got stuck at the faulty node, the tasks are stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there waiting for ever.

      Thanks to Karams for helping with this.

      Attachments

        1. MAPREDUCE-3228-20111020.txt
          12 kB
          Vinod Kumar Vavilapalli
        2. MAPREDUCE-3228-20111027.txt
          18 kB
          Vinod Kumar Vavilapalli

        Activity

          People

            vinodkv Vinod Kumar Vavilapalli
            vinodkv Vinod Kumar Vavilapalli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: