Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.1.0
-
None
-
None
-
Reviewed
Description
The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
There are two major issues.
Problem #1
The output of aocl diagnose:
-------------------------------------------------------------------- Device Name: acl0 Package Pat: /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp Vendor: Intel Corp Physical Dev Name Status Information pac_a10_f200000 Passed PAC Arria 10 Platform (pac_a10_f200000) PCIe 08:00.0 FPGA temperature = 79 degrees C. DIAGNOSTIC_PASSED -------------------------------------------------------------------- Call "aocl diagnose <device-names>" to run diagnose for specified devices Call "aocl diagnose all" to run diagnose for all devices
The plugin fails to recognize this and fails with the following message:
2019-01-25 06:46:02,834 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin: Using FPGA vendor plugin: org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin 2019-01-25 06:46:02,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer: Trying to diagnose FPGA information ... 2019-01-25 06:46:03,085 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule: Using traffic control bandwidth handler 2019-01-25 06:46:03,108 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn 2019-01-25 06:46:03,139 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl: FPGA Plugin bootstrap success. 2019-01-25 06:46:03,247 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Couldn't find (?i)bus:slot.func\s=\s.*, pattern 2019-01-25 06:46:03,248 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern 2019-01-25 06:46:03,251 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: Failed to get major-minor number from reading /dev/pac_a10_f300000 2019-01-25 06:46:03,252 ERROR org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to bootstrap configured resource subsystems! org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: No FPGA devices detected!
Problem #2
The plugin assumes that the file name under /dev can be derived from the "Physical Dev Name", but this is wrong. For example, it thinks that the device file is /dev/pac_a10_f300000 which is not the case, the actual file is /dev/intel-fpga-port.0.