The --packages option to spark-submit uses Ivy to map Maven coordinates to package jars. Currently, the IvySettings are hard-coded with Maven Central as the last repository in the chain of resolvers.
At IBM, we have heard from several enterprise clients that are frustrated with lack of control over their local Spark installations. These clients want to ensure that certain artifacts can be excluded or patched due to security or license issues. For example, a package may use a vulnerable SSL protocol; or a package may link against an AGPL library written by a litigious competitor.
While additional repositories and exclusions can be added on the spark-submit command line, this falls short of what is needed. With Maven Central always as a fall-back repository, it is difficult to ensure only approved artifacts are used and it is often the exclusions that site admins are not aware of that can cause problems. Also, known exclusions are better handled through a centralized managed repository rather than as command line arguments.
To resolve these issues, we propose the following change: allow the user to specify an Ivy Settings XML file to pass in as an optional argument to spark-submit (or specify in a config file) to define alternate repositories used to resolve artifacts instead of the hard-coded defaults. The use case for this would be to define a managed repository (such as Nexus) in the settings file so that all requests for artifacts go through one location only.