[SPARK-4240] Refine Tree Predictions in Gradient Boosting to Improve Prediction Accuracy. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: MLlib
Labels:
- bulk-closed

Description

The gradient boosting as currently implemented estimates the loss-gradient in each iteration using regression trees. At every iteration, the regression trees are trained/split to minimize predicted gradient variance. Additionally, the terminal node predictions are computed to minimize the prediction variance.

However, such predictions won't be optimal for loss functions other than the mean-squared error. The TreeBoosting refinement can help mitigate this issue by modifying terminal node prediction values so that those predictions would directly minimize the actual loss function. Although this still doesn't change the fact that the tree splits were done through variance reduction, it should still lead to improvement in gradient estimations, and thus better performance.

The details of this can be found in the R vignette. This paper also shows how to refine the terminal node predictions.

http://www.saedsayad.com/docs/gbm2.pdf

Attachments

Issue Links

Is contained by

SPARK-14047 GBT improvement umbrella

Resolved

is related to

SPARK-8547 xgboost exploration

Resolved

relates to

SPARK-3727 Trees and ensembles: More prediction functionality

Resolved

Sub-Tasks

1.	Add Newton-Raphson Step per Tree to GBDT Implementation	Closed	Unassigned
2.	gbm-style treeboost	Resolved	Vladimir Feinberg
3.	migrate internal API for MLlib trees from spark.mllib to spark.ml	Resolved	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Sung Chung

Votes:: 2 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 05/Nov/14 07:29

Updated:: 21/May/19 04:15

Resolved:: 21/May/19 04:15