Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2356

A task succeeded even though there were errors on all attempts.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.204.0
    • Component/s: None
    • Labels:
      None

      Description

      From Luke Lu:

      Here is a summary of why the failed map task was considered "successful" (Thanks to Mahadev, Arun and Devaraj
      for insightful discussions).

      1. The map task was hanging BEFORE being initialized (probably in localization, but it doesn't matter in this case).
      Its state is UNASSIGNED.

      2. The jt decided to kill it due to timeout and scheduled a cleanup task on the same node.

      3. The cleanup task has the same attempt id (by design.) but runs in a different JVM. Its initial state is
      FAILED_UNCLEAN.

      4. The JVM of the original attempt is getting killed, while proceeding to setupWorkDir and throwed an
      IllegalStateException while FileSystem.getLocal, which causes taskFinal.taskCleanup being called in Child, and
      triggered the NPE due to the task is not yet initialized (committer is null). Before the NPE, however it sent a
      statusUpdate to TT, and in tip.reportProgress, changed the task state (currently FAILED_UNCLEAN) to UNASSIGNED.

      5. The cleanup attempt succeeded and report done to TT. In tip.reportDone, the isCleanup() check returned false due to
      the UNASSIGNED state and set the task state as SUCCEEDED.

        Activity

        Owen O'Malley created issue -
        Steve Loughran made changes -
        Field Original Value New Value
        Fix Version/s 0.20.204.0 [ 12316318 ]
        Fix Version/s 0.20.203.0 [ 12316151 ]
        Owen O'Malley made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was just released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
        Owen O'Malley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Luke Lu
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development