[YARN-10248] when config allowed-gpu-devices , excluded GPUs still be visible to containers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.2.1
Fix Version/s: None
Component/s: nodemanager
Labels:
- pull-request-available

Target Version/s:

3.2.1
Flags:

Patch

Description

I have a server with two GPU, and i want to use only one of them within yarn cluster.
according to hadoop document, i set configs:

<property>
    <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
    <value>0:1</value>
  </property>
    <property>
    <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
    <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value>
  </property>

then i running following command to test:

yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
         -jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar  -shell_command ' nvidia-smi & sleep 3  ' \
         -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1  \
         -num_containers 1 -queue yufei -node_label_expression slaves

iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information.

I check the related source code and find it is a bug.
the problem is:
when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it,
then when assign to a container some of the gpus, it will set denied gpus for the container,
but it never consider excluded gpu of the host.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-10248-branch-3.2.001.path
29/Apr/20 05:15
17 kB
zhao yufei
YARN-10248-branch-3.2.001.path
29/Apr/20 05:15
17 kB
zhao yufei

Issue Links

relates to

YARN-9073 GPU/FPGA whitelist configuration in container-executor.cfg won't work when yarn-site.xml's allowed devices doesn't align with it

Open

Activity

People

Assignee:: zhao yufei

Reporter:: zhao yufei

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 28/Apr/20 12:50

Updated:: 08/Aug/20 05:20