Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9595

FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • nodemanager
    • None

    Description

      YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes currentFpgaInfo is not set, resulting in an NPE being thrown:

      2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: java.lang.NullPointerException
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
      

      The problem is that in FpgaDiscoverer, we don't set currentFpgaInfo if the following condition is true:

      if (allowed == null || allowed.equalsIgnoreCase(
              YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
            return list;
          } else if (allowed.matches("(\\d,)*\\d")){
          ...
      

      Solution is simple: initialize it in both code-paths.

      Unit tests should be enhanced to verify that it's set properly.

      Attachments

        1. YARN-9595-001.patch
          4 kB
          Peter Bacsko

        Activity

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: