Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5000

TaskImpl.getCounters() can return the counters for the wrong task attempt when task is speculating

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.6
    • 0.23.7, 2.1.0-beta
    • mr-am
    • None

    Description

      When a task is speculating and one attempt completes then sometimes the counters for the wrong attempt are aggregated into the total counters for the job. The scenario looks like this:

      1. Two task attempts are racing, _0 and _1
      2. _1 finishes first, causing the task to issue a TA_KILL to attempt _0
      3. _0 receives TA_KILL, sets progress to 1.0f and waits for container cleanup
      4. if TaskImpl.getCounters() is called now, TaskImpl.selectBestAttempt() can return _0 since it is not quite yet in the KILLED state yet progress is maxed out and no other attempt has more progress.

      Attachments

        1. MAPREDUCE-5000.patch
          5 kB
          Jason Darrell Lowe
        2. MAPREDUCE-5000-branch-0.23.patch
          5 kB
          Jason Darrell Lowe

        Activity

          People

            jlowe Jason Darrell Lowe
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: