Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45527

Task fraction resource request is not expected

    XMLWordPrintableJSON

Details

    Description

       

      test("SPARK-XXX") {
        import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
      
        withTempDir { dir =>
          val scriptPath = createTempScriptWithExpectedOutput(dir, "gpuDiscoveryScript",
            """{"name": "gpu","addresses":["0"]}""")
      
          val conf = new SparkConf()
            .setAppName("test")
            .setMaster("local-cluster[1, 12, 1024]")
            .set("spark.executor.cores", "12")
          conf.set(TASK_GPU_ID.amountConf, "0.08")
          conf.set(WORKER_GPU_ID.amountConf, "1")
          conf.set(WORKER_GPU_ID.discoveryScriptConf, scriptPath)
          conf.set(EXECUTOR_GPU_ID.amountConf, "1")
          sc = new SparkContext(conf)
          val rdd = sc.range(0, 100, 1, 4)
          var rdd1 = rdd.repartition(3)
          val treqs = new TaskResourceRequests().cpus(1).resource("gpu", 1.0)
          val rp = new ResourceProfileBuilder().require(treqs).build
          rdd1 = rdd1.withResources(rp)
          assert(rdd1.collect().size === 100)
        }
      } 

      In the above test, the 3 tasks generated by rdd1 are expected to be executed in sequence as we expect "new TaskResourceRequests().cpus(1).resource("gpu", 1.0)" should override "conf.set(TASK_GPU_ID.amountConf, "0.08")". However, those 3 tasks are run in parallel in fact.

      The root cause is that ExecutorData#ExecutorResourceInfo#numParts is static. In this case, the "gpu.numParts" is initialized with 12 (1/0.08) and won't change even if there's a new task resource request (e.g., resource("gpu", 1.0) in this case). Thus, those 3 tasks are able to be executed in parallel.
       

      Attachments

        Issue Links

          Activity

            People

              wbo4958 Bobby Wang
              Ngone51 wuyi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: