Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45762

Shuffle managers defined in user jars are not available for some launch modes

    XMLWordPrintableJSON

Details

    Description

      Starting a spark job in standalone mode with a custom `ShuffleManager` provided in a jar via `--jars` does not work. This can also be experienced in local-cluster mode.

      The approach that works consistently is to copy the jar containing the custom `ShuffleManager` to a specific location in each node then add it to `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we would like to move away from setting extra configurations unnecessarily.

      Example:

      $SPARK_HOME/bin/spark-shell \
        --master spark://127.0.0.1:7077 \
        --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
        --jars user-code.jar
      

      This yields `java.lang.ClassNotFoundException` in the executors.

      Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
      Caused by: java.lang.ClassNotFoundException: org.apache.spark.examples.TestShuffleManager
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:467)
        at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
        at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:95)
        at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
        at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        ... 4 more
      

      We can change our command to use `extraClassPath`:

      $SPARK_HOME/bin/spark-shell \
        --master spark://127.0.0.1:7077 \
        --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
        --conf spark.driver.extraClassPath=user-code.jar \
       --conf spark.executor.extraClassPath=user-code.jar
      

      Success after adding the jar to `extraClassPath`:

      23/10/26 12:58:26 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps)
      23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!!
      23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886-4444-8c7f-9dca2c880c2c
      

      We would like to change startup order such that the original command succeeds, without specifying `extraClassPath`:

      $SPARK_HOME/bin/spark-shell \
        --master spark://127.0.0.1:7077 \
        --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
        --jars user-code.jar
      

      Proposed changes:

      Refactor code so we initialize the `ShuffleManager` later, after jars have been localized. This is especially necessary in the executor, where we would need to move this initialization until after the `replClassLoader` is updated with jars passed in `--jars`.

      Today, the `ShuffleManager` is instantiated at `SparkEnv` creation. Having to instantiate the `ShuffleManager` this early doesn't work, because user jars have not been localized in all scenarios, and we will fail to load the `ShuffleManager`. We propose moving the `ShuffleManager` instantiation to `SparkContext` on the driver, and Executor, to help with this issue.

      Attachments

        Issue Links

          Activity

            People

              abellina Alessandro Bellina
              abellina Alessandro Bellina
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: