Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: applicationmaster, mrv2
    • Labels:
      None

      Description

      Found this on one of the gridmix runs, again. One of the nodes went real bad, the job had three containers running on the node. Eventually, AM marked the tasks as timedout and initiated cleanup of the failed containers via stopContainer(). The later got stuck at the faulty node, the tasks are stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there waiting for ever.

      Thanks to Karam Singh for helping with this.

        Attachments

        1. MAPREDUCE-3228-20111027.txt
          18 kB
          Vinod Kumar Vavilapalli
        2. MAPREDUCE-3228-20111020.txt
          12 kB
          Vinod Kumar Vavilapalli

          Activity

            People

            • Assignee:
              vinodkv Vinod Kumar Vavilapalli
              Reporter:
              vinodkv Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: