Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32120

Single GPU is allocated multiple times

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 3.0.0
    • None
    • Scheduler
    • None

    Description

      I am running Spark in a local-cluster[2,1,1024] with one GPU per worker, task and executor, and two GPUs provided through a GPU discovery script. The same GPU is allocated to both executors.

      Discovery script output:

      {"name": "gpu", "addresses": ["0", "1"]}
      

      Spark local cluster setup through spark-shell:

      ./spark-3.0.0-bin-hadoop2.7/bin/spark-shell --master "local-cluster[2,1,1024]" --conf spark.worker.resource.gpu.discoveryScript=/tmp/gpu.json --conf spark.worker.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=1 --conf spark.executor.resource.gpu.amount=1
      

      Executor page of this cluster:

      You can see that both executors have the same GPU allocated: [1]

      Code run in the Spark shell:

      scala> import org.apache.spark.TaskContext
      import org.apache.spark.TaskContext
      
      scala> def fn(it: Iterator[java.lang.Long]): Iterator[(String, (String, Array[String]))] = { TaskContext.get().resources().mapValues(v => (v.name, v.addresses)).iterator }
      fn: (it: Iterator[Long])Iterator[(String, (String, Array[String]))]
      
      scala> spark.range(0,2,1,2).mapPartitions(fn).collect
      res0: Array[(String, (String, Array[String]))] = Array((gpu,(gpu,Array(1))), (gpu,(gpu,Array(1))))
      

      The result shows that each task got GPU 1. The executor page shows that each task has been run on different executors (see above screenshot).

      The expected behaviour would have been to have GPU 0 assigned to one executor and GPU 1 to the other executor. Consequently, each partition / task should then see a different GPU.

      With Spark 3.0.0-preview2 the allocation was as expected (identical code and Spark shell setup):

      res0: Array[(String, (String, Array[String]))] = Array((gpu,(gpu,Array(0))), (gpu,(gpu,Array(1))))
      

      Happy to contribute a patch if this is an accepted bug.

      Attachments

        1. screenshot-2.png
          38 kB
          Enrico Minack
        2. screenshot-3.png
          39 kB
          Enrico Minack

        Activity

          People

            Unassigned Unassigned
            enricomi Enrico Minack
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: