Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25309

Sci-kit Learn like Auto Pipeline Parallelization in Spark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.1
    • None
    • ML, PySpark

    Description

      SPARK-19357 and SPARK-21911 have helped parallelize Pipelines in Spark. However, instead of setting the parallelism Parameter in the CrossValidator it would be good to have something like njobs=-1 (like Scikit Learn) where the Pipeline DAG could be automatically parallelized and scheduled based on the resources allocated to the Spark Session instead of having the user pick the integer value for this parameter. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            RaviShanbhag Ravi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: