Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4591

Algorithm/model parity for spark.ml (Scala)



    • Umbrella
    • Status: Resolved
    • Critical
    • Resolution: Done
    • None
    • None
    • ML
    • None


      This is an umbrella JIRA for porting spark.mllib implementations to use the DataFrame-based API defined under spark.ml. We want to achieve critical feature parity for the next release.

      Instructions for 3 subtask types

      Review tasks: detailed review of a subpackage to identify feature gaps between spark.mllib and spark.ml.

      • Should be listed as a subtask of this umbrella.
      • Review subtasks cover major algorithm groups. To pick up a review subtask, please:
        • Comment that you are working on it.
        • Compare the public APIs of spark.ml vs. spark.mllib.
        • Comment on all missing items within spark.ml: algorithms, models, methods, features, etc.
        • Check for existing JIRAs covering those items. If there is no existing JIRA, create one, and link it to your comment.

      Critical tasks: higher priority missing features which are required for this umbrella JIRA.

      • Should be linked as "requires" links.

      Other tasks: lower priority missing features which can be completed after the critical tasks.

      • Should be linked as "contains" links.

      Excluded items

      This does not include:

      • Python: We can compare Scala vs. Python in spark.ml itself.
      • Moving linalg to spark.ml: SPARK-13944
      • Streaming ML: Requires stabilizing some internal APIs of structured streaming first

      TODO list

      Critical issues

      Lower priority issues

      • Missing methods within algorithms (see Issue Links below)
      • evaluation submodule
      • stat submodule (should probably be covered in DataFrames)
      • Developer-facing submodules:
        • optimization (including SPARK-17136)
        • random, rdd
        • util

      To be prioritized


        Issue Links



              Unassigned Unassigned
              mengxr Xiangrui Meng
              4 Vote for this issue
              18 Start watching this issue