Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.5.0
Description
Starting a spark job in standalone mode with a custom `ShuffleManager` provided in a jar via `--jars` does not work. This can also be experienced in local-cluster mode.
The approach that works consistently is to copy the jar containing the custom `ShuffleManager` to a specific location in each node then add it to `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we would like to move away from setting extra configurations unnecessarily.
Example:
$SPARK_HOME/bin/spark-shell \
--master spark://127.0.0.1:7077 \
--conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
--jars user-code.jar
This yields `java.lang.ClassNotFoundException` in the executors.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.examples.TestShuffleManager at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:467) at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41) at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36) at org.apache.spark.util.Utils$.classForName(Utils.scala:95) at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ... 4 more
We can change our command to use `extraClassPath`:
$SPARK_HOME/bin/spark-shell \
--master spark://127.0.0.1:7077 \
--conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
--conf spark.driver.extraClassPath=user-code.jar \
--conf spark.executor.extraClassPath=user-code.jar
Success after adding the jar to `extraClassPath`:
23/10/26 12:58:26 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps) 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!! 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886-4444-8c7f-9dca2c880c2c
We would like to change startup order such that the original command succeeds, without specifying `extraClassPath`:
$SPARK_HOME/bin/spark-shell \
--master spark://127.0.0.1:7077 \
--conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
--jars user-code.jar
Proposed changes:
Refactor code so we initialize the `ShuffleManager` later, after jars have been localized. This is especially necessary in the executor, where we would need to move this initialization until after the `replClassLoader` is updated with jars passed in `--jars`.
Today, the `ShuffleManager` is instantiated at `SparkEnv` creation. Having to instantiate the `ShuffleManager` this early doesn't work, because user jars have not been localized in all scenarios, and we will fail to load the `ShuffleManager`. We propose moving the `ShuffleManager` instantiation to `SparkContext` on the driver, and Executor, to help with this issue.
Attachments
Issue Links
- blocks
-
SPARK-45792 SPIP: ShuffleManager short name registration via SparkPlugin
- Open
- causes
-
SPARK-49502 Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle
- Resolved
- links to