[SPARK-16728] migrate internal API for MLlib trees from spark.mllib to spark.ml - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib
Labels:
- bulk-closed

Description

Currently, spark.ml trees rely on spark.mllib implementations. There are two issues with this:

1. Spark.ML's GBT TreeBoost algorithm requires storing additional information (the previous ensemble's prediction, for instance) inside the TreePoints (this is necessary to have loss-based splits for complex loss functions).
2. The old impurity API only lets you use summary statistics up to the 2nd order. These are useless for several impurity measures and inadequate for others (e.g., absolute loss or huber loss). It needs some renovation.
3. We should probably coalesce the ImpurityAggregator, ImpurityCalculator, and Impurity into a single class (and use virtual calls rather than case statements when toggling over impurity types).

Attachments

Issue Links

is duplicated by

SPARK-12381 Copy public decision tree helper classes from spark.mllib to spark.ml and make private

Resolved

SPARK-12383 Move unit tests for GBT from spark.mllib to spark.ml

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Vladimir Feinberg

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Jul/16 00:38

Updated:: 21/May/19 04:34

Resolved:: 21/May/19 04:34