Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: applicationmaster, mrv2
    • Labels:
      None

      Description

      Found this on one of the gridmix runs, again. One of the nodes went real bad, the job had three containers running on the node. Eventually, AM marked the tasks as timedout and initiated cleanup of the failed containers via stopContainer(). The later got stuck at the faulty node, the tasks are stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there waiting for ever.

      Thanks to Karam Singh for helping with this.

      1. MAPREDUCE-3228-20111027.txt
        18 kB
        Vinod Kumar Vavilapalli
      2. MAPREDUCE-3228-20111020.txt
        12 kB
        Vinod Kumar Vavilapalli

        Activity

        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Vinod Kumar Vavilapalli made changes -
        Attachment MAPREDUCE-3228-20111027.txt [ 12501089 ]
        Vinod Kumar Vavilapalli made changes -
        Attachment MAPREDUCE-3228-20111020.txt [ 12499852 ]
        Vinod Kumar Vavilapalli made changes -
        Field Original Value New Value
        Assignee Vinod Kumar Vavilapalli [ vinodkv ]
        Vinod Kumar Vavilapalli created issue -

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development