Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12326

Move GBT implementation from spark.mllib to spark.ml

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Done
    • None
    • None
    • ML, MLlib
    • None

    Description

      Several improvements can be made to gradient boosted trees, but are not possible without moving the GBT implementation to spark.ml (e.g. rawPrediction column, feature importance). This Jira is for moving the current GBT implementation to spark.ml, which will have roughly the following steps:

      1. Copy the implementation to spark.ml and change spark.ml classes to use that implementation. Current tests will ensure that the implementations learn exactly the same models.
      2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, InformationGainStats, ImpurityStats, DTStatsAggregator, etc...). Since eventually all tree implementations will reside in spark.ml, the helper classes should as well.
      3. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around the spark.ml implementation. The spark.ml tests will again ensure that we do not change any behavior.
      4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to verify model equivalence.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sethah Seth Hendrickson
            sethah Seth Hendrickson
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment