Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39779

Support adding maven packages while pip

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • PySpark

    Description

      The goal is to support adding maven packages to a pip installable package, including adding jars and resolvers, so that when spark gets booted up it can automatically look for the maven packages in the classpath and install corresponding dependencies.

       

      This idea comes up because currently for a python package, which depends on jars like pyspark internally use reflection on spark source code, if we want to make it work, there're two steps: 1. pip install the python package. 2. Add the jar into spark configuration while we start spark session, for example through spark.jars.packages.

      If we can support the proposed functionality, we could ideally just add the package name and resolver while we pip install the package, and when spark session starts, it can look for those configurations inside python classpath and install them if they're not existed. This will simplify the process of all python developers who internally depends on maven packages and make pyspark more user-friendly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            serenaruan Serena Ruan
            Mark Hamilton Mark Hamilton
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: