Details
-
New Feature
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
Sometimes a user may want to include their own version of a jar that spark itself uses. For example, if their code requires a newer version of that jar than Spark offers. It would be good to have an option to give the users dependencies precedence over Spark. This options should be disabled by default, since it could lead to some odd behavior (e.g. parts of Spark not working). But I think we should have it.
From an implementation perspective, this would require modifying the way we do class loading inside of an Executor. The default behavior of the URLClassLoader is to delegate to it's parent first and, if that fails, to find a class locally. We want to have the opposite behavior. This is sometimes referred to as "parent-last" (as opposed to "parent-first") class loading precedence. There is an example of how to do this here:
We should write a similar class which can encapsulate a URL classloader and change the delegation order. Or if possible, maybe we could find a more elegant way to do this. See relevant discussion on the user list here:
https://groups.google.com/forum/#!topic/spark-users/b278DW3e38g
Also see the corresponding option in Hadoop:
https://issues.apache.org/jira/browse/MAPREDUCE-4521
Some other relevant Hadoop JIRA's:
https://issues.apache.org/jira/browse/MAPREDUCE-1700
https://issues.apache.org/jira/browse/MAPREDUCE-1938