Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.5
-
None
Description
Problem:
When running tpc-ds test (https://github.com/databricks/spark-sql-perf), occasionally we see error related to class not found:
2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw exception: scala.ScalaReflectionException: class com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with
sun.misc.Launcher$AppClassLoader@28ba21f3 of type class sun.misc.Launcher$AppClassLoader with classpath [...]
and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class sun.misc.Launcher$ExtClassLoader with classpath [...]
and parent being primordial classloader with boot classpath [...] not found.
Root cause:
Spark driver starts ApplicationMaster in the main thread, which starts a user thread and set MutableURLClassLoader to that thread's ContextClassLoader.
userClassThread = startUserApplication()
The main thread then setup YarnSchedulerBackend RPC endpoints, which handles these calls using scala Future with the default global ExecutionContext:
- doRequestTotalExecutors
- doKillExecutors
If main thread starts a future to handle doKillExecutors() before user thread does then the default thread pool thread's ContextClassLoader would be the default (AppClassLoader).
If user thread starts a future first then the thread pool thread will have MutableURLClassLoader.
So if user's code uses a future which references a user provided class (only MutableURLClassLoader can load), and before the future if there are executor lost, you will see errors related to class not found.
Proposed Solution:
We can potentially solve this problem in one of two ways:
1) Set the same class loader (userClassLoader) to both the main thread and user thread in ApplicationMaster.scala
2) Do not use "ExecutionContext.Implicits.global" in YarnSchedulerBackend
Attachments
Issue Links
- links to