Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4999

AM attempt ended up in ERROR state and generated history after node decommissioned

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.23.6
    • None
    • mr-am
    • None

    Description

      Saw a case where a job recorded history for an app attempt that ended up in the ERROR state after the node the AM was running on was decommissioned. When the node was decommissioned, the RM marked all the containers on the node as killed and subsequently the application attempt was invalidated. When the AM attempt heartbeated in before the NM did (and therefore before the NM killed the AM) it discovered it was no longer a valid app attempt and exited in the ERROR state. However it also thought, incorrectly, that it was the last attempt and generated the history for the job.

      Decommissioning a node should not cause an app attempt to end up in the ERROR state with history, as the subsequent app attempt should be the one to generate the definitive history for the job.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: