Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2478

Add Python APIs for decision tree

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: MLlib, PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      In v1.0, we only support decision tree in Scala/Java. It would be nice to add Python support. It may require some refactoring of the current decision tree API to make it easier to construct a decision tree algorithm in Python.

      1. Simplify decision tree constructors such that only simple types are used.
      a. Hide the implementation of Impurity from users.
      b. Replace enums by strings.
      2. Make separate public decision tree classes for regression & classification (with shared internals). Eliminate algo parameter.
      3. Implement wrappers in Python for DecisionTree.
      4. Implement wrappers in Python for DecisionTreeModel.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                josephkb Joseph K. Bradley
                Reporter:
                mengxr Xiangrui Meng
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: