As various big data workload running on YARN, CPU will no longer scale eventually and heterogeneous systems will become more important. ML/DL is a rising star in recent years, applications focused on these areas have to utilize GPU or FPGA to boost performance. Also, hardware vendors such as Intel also invest in such hardware. It is most likely that FPGA will become popular in data centers like CPU in the near future.
So YARN as a resource managing and scheduling system, would be great to evolve to support this. This JIRA proposes FPGA to be a first-class citizen. The changes roughly includes:
1. FPGA resource detection and heartbeat
2. Scheduler changes (
3. FPGA related preparation and isolation before launch container
We know that
YARN-3926 is trying to extend current resource model. But still we can leave some FPGA related discussion here