Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7028

Concurrent task progress updates causing NPE in Application Master

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
    • Fix Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
    • Component/s: mr-am
    • Labels:
      None

      Description

      Concurrent task progress updates can cause a NullPointerException in the Application Master (stack trace is with code at current trunk):

      2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
      2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
      2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
      at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
      at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
      at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
      at java.lang.Thread.run(Thread.java:748)
      2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
      2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..

      This happened naturally in several big wordcount runs, and I could reproduce this reliably by artificially making task updates more frequent.

        Attachments

        1. MAPREDUCE-7028.004.patch
          3 kB
          Gergo Repas
        2. MAPREDUCE-7028.003.patch
          3 kB
          Gergo Repas
        3. MAPREDUCE-7028.002.patch
          3 kB
          Gergo Repas
        4. MAPREDUCE-7028.001.patch
          3 kB
          Gergo Repas
        5. MAPREDUCE-7028.000.patch
          4 kB
          Gergo Repas

          Issue Links

            Activity

              People

              • Assignee:
                grepas Gergo Repas
                Reporter:
                grepas Gergo Repas
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: