Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9264 [Umbrella] Follow-up on IntelOpenCL FPGA plugin
  3. YARN-9265

FPGA plugin fails to recognize Intel Processing Accelerator Card

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).

      There are two major issues.

      Problem #1

      The output of aocl diagnose:

      --------------------------------------------------------------------
      Device Name:
      acl0
       
      Package Pat:
      /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
       
      Vendor: Intel Corp
       
      Physical Dev Name   Status            Information
       
      pac_a10_f200000     Passed            PAC Arria 10 Platform (pac_a10_f200000)
                                            PCIe 08:00.0
                                            FPGA temperature = 79 degrees C.
       
      DIAGNOSTIC_PASSED
      --------------------------------------------------------------------
       
      Call "aocl diagnose <device-names>" to run diagnose for specified devices
      Call "aocl diagnose all" to run diagnose for all devices
      

      The plugin fails to recognize this and fails with the following message:

      2019-01-25 06:46:02,834 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin: Using FPGA vendor plugin: org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
      2019-01-25 06:46:02,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer: Trying to diagnose FPGA information ...
      2019-01-25 06:46:03,085 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule: Using traffic control bandwidth handler
      2019-01-25 06:46:03,108 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
      2019-01-25 06:46:03,139 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl: FPGA Plugin bootstrap success.
      2019-01-25 06:46:03,247 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Couldn't find (?i)bus:slot.func\s=\s.*, pattern
      2019-01-25 06:46:03,248 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
      2019-01-25 06:46:03,251 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Failed to get major-minor number from reading /dev/pac_a10_f300000
      2019-01-25 06:46:03,252 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems!
      org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: No FPGA devices detected!
      

      Problem #2

      The plugin assumes that the file name under /dev can be derived from the "Physical Dev Name", but this is wrong. For example, it thinks that the device file is /dev/pac_a10_f300000 which is not the case, the actual file is /dev/intel-fpga-port.0.

        Attachments

        1. YARN-9265-001.patch
          23 kB
          Peter Bacsko
        2. YARN-9265-002.patch
          25 kB
          Peter Bacsko
        3. YARN-9265-003.patch
          26 kB
          Peter Bacsko
        4. YARN-9265-004.patch
          27 kB
          Peter Bacsko
        5. YARN-9265-005.patch
          27 kB
          Peter Bacsko
        6. YARN-9265-006.patch
          27 kB
          Peter Bacsko
        7. YARN-9265-007.patch
          28 kB
          Peter Bacsko
        8. YARN-9265-008.patch
          43 kB
          Peter Bacsko
        9. YARN-9265-009.patch
          46 kB
          Peter Bacsko

          Activity

            People

            • Assignee:
              pbacsko Peter Bacsko
              Reporter:
              pbacsko Peter Bacsko
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: