Details
-
Bug
-
Status: Patch Available
-
Minor
-
Resolution: Unresolved
-
3.2.1
-
None
-
Patch
Description
I have a server with two GPU, and i want to use only one of them within yarn cluster.
according to hadoop document, i set configs:
<property> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> <value>0:1</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value> </property>
then i running following command to test:
yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
-jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar -shell_command ' nvidia-smi & sleep 3 ' \
-container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1 \
-num_containers 1 -queue yufei -node_label_expression slaves
iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information.
I check the related source code and find it is a bug.
the problem is:
when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it,
then when assign to a container some of the gpus, it will set denied gpus for the container,
but it never consider excluded gpu of the host.
Attachments
Attachments
Issue Links
- relates to
-
YARN-9073 GPU/FPGA whitelist configuration in container-executor.cfg won't work when yarn-site.xml's allowed devices doesn't align with it
- Open