In some cases it may be desirable to run multiple instances of the Spark Shuffle Service which are using different versions of Spark. This can be helpful, for example, when running a YARN cluster with a mixed workload of applications running multiple Spark versions, since a given version of the shuffle service is not always compatible with other versions of Spark. (See SPARK-27780 for more detail on this)
YARN versions since 2.9.0 support the ability to run shuffle services within an isolated classloader (see
YARN-4577), meaning multiple Spark versions can coexist within a single NodeManager.
To support this from the Spark side, we need to make two enhancements:
- Make the name of the shuffle service configurable. Currently it is hard-coded to be spark_shuffle on both the client and server side. The server-side name is not actually used anywhere, as it is the value within the yarn.nodemanager.aux-services which is considered by the NodeManager to be definitive name. However, if you change this in the configs, the hard-coded name within the client will no longer match. So, this needs to be configurable.
- Add a way to separately configure the two shuffle service instances. Since the configurations such as the port number are taken from the NodeManager config, they will both try to use the same port, which obviously won't work. So, we need to provide a way to selectively configure the two shuffle service instances. I will go into details on my proposal for how to achieve this within the PR.