Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10248

when config allowed-gpu-devices , excluded GPUs still be visible to containers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.2.1
    • Fix Version/s: None
    • Component/s: nodemanager
    • Target Version/s:
    • Flags:
      Patch

      Description

      I have a server with two GPU, and i want to use only one of them within yarn cluster.
      according to hadoop document, i set configs:

      <property>
          <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
          <value>0:1</value>
        </property>
          <property>
          <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
          <value>/etc/alternatives/x86_64-linux-gnu_nvidia_smi</value>
        </property>
      

      then i running following command to test:

      yarn jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
               -jar ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar  -shell_command ' nvidia-smi & sleep 3  ' \
               -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1  \
               -num_containers 1 -queue yufei -node_label_expression slaves
      

      iI expected gpu with minor number 0 will not visible to container, but in the launched container, nvidia-smi print two gpu information.

      I check the related source code and find it is a bug.
      the problem is:
      when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus from it,
      then when assign to a container some of the gpus, it will set denied gpus for the container,
      but it never consider excluded gpu of the host.

        Attachments

        1. YARN-10248-branch-3.2.001.path
          17 kB
          zhao yufei
        2. YARN-10248-branch-3.2.001.path
          17 kB
          zhao yufei

          Issue Links

            Activity

              People

              • Assignee:
                jasstionzyf zhao yufei
                Reporter:
                jasstionzyf zhao yufei
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: