Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4227

Ignore expired containers from removed nodes in FairScheduler

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.0, 2.5.0, 2.7.1
    • Fix Version/s: 3.1.0, 2.10.0
    • Component/s: fairscheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Under some circumstances the node is removed before an expired container event is processed causing the RM to exit:

      2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1436927988321_1307950_01_000012 Timed out after 600 secs
      2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1436927988321_1307950_01_000012 Container Transitioned from ACQUIRED to EXPIRED
      2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Completed container: container_1436927988321_1307950_01_000012 in state: EXPIRED event:EXPIRE
      2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1436927988321_1307950	CONTAINERID=container_1436927988321_1307950_01_000012
      2015-10-04 21:14:01,063 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_EXPIRED to the scheduler
      java.lang.NullPointerException
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
      	at java.lang.Thread.run(Thread.java:745)
      2015-10-04 21:14:01,063 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
      

      The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 and 2.6.0 by different customers.

        Attachments

        1. YARN-4227.006.patch
          7 kB
          Wilfred Spiegelenburg
        2. YARN-4227.2.patch
          6 kB
          Wilfred Spiegelenburg
        3. YARN-4227.3.patch
          6 kB
          Wilfred Spiegelenburg
        4. YARN-4227.4.patch
          6 kB
          Wilfred Spiegelenburg
        5. YARN-4227.5.patch
          6 kB
          Wilfred Spiegelenburg
        6. YARN-4227.patch
          1 kB
          Wilfred Spiegelenburg

          Activity

            People

            • Assignee:
              wilfreds Wilfred Spiegelenburg
              Reporter:
              wilfreds Wilfred Spiegelenburg
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: