It looks like Spark 3.2.0's POMs are no longer "dependency reduced". As a result, applications may pull in additional unnecessary dependencies when depending on Spark.
Spark uses the Maven Shade plugin to create effective POMs and to bundle shaded versions of certain libraries with Spark (namely, Jetty, Guava, and JPPML). By default, the Maven Shade plugin generates simplified POMs which remove dependencies on artifacts that have been shaded.
SPARK-33212 / b6f46ca29742029efea2790af7fdefbc2fcf52de changed the configuration of the Maven Shade plugin, setting createDependencyReducedPom to false.
As a result, the generated POMs now include compile-scope dependencies on the shaded libraries. For example, compare the org.eclipse.jetty dependencies in:
- Spark 3.1.2: https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.2/spark-core_2.12-3.1.2.pom
- Spark 3.2.0 RC2: https://repository.apache.org/content/repositories/orgapachespark-1390/org/apache/spark/spark-core_2.12/3.2.0/spark-core_2.12-3.2.0.pom
I think we should revert back to generating "dependency reduced" POMs to ensure that Spark declares a proper set of dependencies and to avoid "unknown unknown" consequences of changing our generated POM format.
SPARK-36873 Add provided Guava dependency for network-yarn module
- is caused by
SPARK-33212 Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
- links to