Description
As various big data workload running on YARN, CPU will no longer scale eventually and heterogeneous systems will become more important. ML/DL is a rising star in recent years, applications focused on these areas have to utilize GPU or FPGA to boost performance. Also, hardware vendors such as Intel also invest in such hardware. It is most likely that FPGA will become popular in data centers like CPU in the near future.
So YARN as a resource managing and scheduling system, would be great to evolve to support this. This JIRA proposes FPGA to be a first-class citizen. The changes roughly includes:
1. FPGA resource detection and heartbeat
2. Scheduler changes (YARN-3926 invlolved)
3. FPGA related preparation and isolation before launch container
We know that YARN-3926 is trying to extend current resource model. But still we can leave some FPGA related discussion here
Attachments
Attachments
Issue Links
1.
|
Support updating FPGA related constraint node label after FPGA device re-configuration | Open | Unassigned | |
2.
|
Add support for FPGA information shown in webUI | Open | Unassigned | |
3.
|
Support RESTful API in NM for query FPGA allocation | Open | Zhankun Tang |