Description
Since the total/max number of executor is constant throughout the application - in dynamic or static allocation - there is loose control over how much GPUs will be requested from the resource manager.
For example, if an application needs 500 executors for the ETL part (with N cores each), but it needs - or allowed - only 50 GPUs for the DL part, in practice it will request at least 500 GPUs from the RM, since `spark.executor.instances` is set to 500. This leads to resource management challenges in multi tenant environments.
A quick workaround is to repartition the RDD to 50 partitions just before switching resources, but it has obvious downsides.
It would be very helpful if the total/max number of executors could also be configured in the Resource Profile.