Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9595

FPGA plugin: NullPointerException in FpgaNodeResourceUpdateHandler.updateConfiguredResource()

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:

      Description

      YARN-9264 accidentally introduced a bug in FpgaDiscoverer. Sometimes currentFpgaInfo is not set, resulting in an NPE being thrown:

      2019-06-03 05:14:50,157 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: java.lang.NullPointerException
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaNodeResourceUpdateHandler.updateConfiguredResource(FpgaNodeResourceUpdateHandler.java:54)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(NodeStatusUpdaterImpl.java:358)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceInit(NodeStatusUpdaterImpl.java:190)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:459)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
      

      The problem is that in FpgaDiscoverer, we don't set currentFpgaInfo if the following condition is true:

      if (allowed == null || allowed.equalsIgnoreCase(
              YarnConfiguration.AUTOMATICALLY_DISCOVER_GPU_DEVICES)) {
            return list;
          } else if (allowed.matches("(\\d,)*\\d")){
          ...
      

      Solution is simple: initialize it in both code-paths.

      Unit tests should be enhanced to verify that it's set properly.

        Attachments

        1. YARN-9595-001.patch
          4 kB
          Peter Bacsko

          Activity

            People

            • Assignee:
              pbacsko Peter Bacsko
              Reporter:
              pbacsko Peter Bacsko
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: