Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The current GPU/FPGA behavior may has an issue when c-g.cfg doesn't align with yarn-site.xml. Take GPU for instance:
One host has 1,2,3,4,5,6. And "GPU.allowed = 1,2,3" configured in c-e.cfg. But yarn-site.xml configured "auto" which means allow 1,2,3,4,5,6.
And one application request 4 GPU, the scheduler allocated 1,2,4,5. So --excluded-gpus is "3". And c-e will check that 3 is in allowed list(1,2,3) and then only deny 3 in cgroups.
In this case, c-e's allowed-list (1,2,3) doesn't work because the application can access 4,5,6 now.
Attachments
Issue Links
- is related to
-
YARN-10248 when config allowed-gpu-devices , excluded GPUs still be visible to containers
- Patch Available