Details
Description
The goal is to support adding maven packages to a pip installable package, including adding jars and resolvers, so that when spark gets booted up it can automatically look for the maven packages in the classpath and install corresponding dependencies.
This idea comes up because currently for a python package, which depends on jars like pyspark internally use reflection on spark source code, if we want to make it work, there're two steps: 1. pip install the python package. 2. Add the jar into spark configuration while we start spark session, for example through spark.jars.packages.
If we can support the proposed functionality, we could ideally just add the package name and resolver while we pip install the package, and when spark session starts, it can look for those configurations inside python classpath and install them if they're not existed. This will simplify the process of all python developers who internally depends on maven packages and make pyspark more user-friendly.