Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7054 Yarn Service Phase 2
  3. YARN-7486

Race condition in service AM that can cause NPE

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • None
    • None

    Description

      1. container1 completed for instance1
      2. instance1 is added to pending list, and send an event asynchronously to instance1 to run ContainerStoppedTransition
      3. container2 allocated, and assigned to instance1, it records the container2 inside instance1
      4. in the meantime, instance1 ContainerStoppedTransition is called and that set the container back to null.
      This cause the recorded container lost.

      		java.lang.NullPointerException
      			at org.apache.hadoop.yarn.service.provider.ProviderUtils.initCompTokensForSubstitute(ProviderUtils.java:402)
      			at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:70)
      			at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:89)
      			at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      			at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      			at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. YARN-7486.01.patch
          47 kB
          Jian He
        2. YARN-7486.02.patch
          44 kB
          Billie Rinaldi

        Activity

          People

            jianhe Jian He
            jianhe Jian He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: