Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5000

TaskImpl.getCounters() can return the counters for the wrong task attempt when task is speculating

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.6
    • Fix Version/s: 0.23.7, 2.1.0-beta
    • Component/s: mr-am
    • Labels:
      None

      Description

      When a task is speculating and one attempt completes then sometimes the counters for the wrong attempt are aggregated into the total counters for the job. The scenario looks like this:

      1. Two task attempts are racing, _0 and _1
      2. _1 finishes first, causing the task to issue a TA_KILL to attempt _0
      3. _0 receives TA_KILL, sets progress to 1.0f and waits for container cleanup
      4. if TaskImpl.getCounters() is called now, TaskImpl.selectBestAttempt() can return _0 since it is not quite yet in the KILLED state yet progress is maxed out and no other attempt has more progress.
      1. MAPREDUCE-5000-branch-0.23.patch
        5 kB
        Jason Lowe
      2. MAPREDUCE-5000.patch
        5 kB
        Jason Lowe

        Activity

        Jason Lowe created issue -
        Jason Lowe made changes -
        Field Original Value New Value
        Assignee Jason Lowe [ jlowe ]
        Jason Lowe made changes -
        Attachment MAPREDUCE-5000.patch [ 12568880 ]
        Jason Lowe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 0.23.7, 2.0.4-beta [ 12323954, 12324032 ]
        Jason Lowe made changes -
        Attachment MAPREDUCE-5000-branch-0.23.patch [ 12569115 ]
        Siddharth Seth made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.23.7 [ 12323954 ]
        Fix Version/s 2.0.4-beta [ 12324032 ]
        Resolution Fixed [ 1 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jason Lowe
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development