[SPARK-3159] Check for reducible DecisionTree - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.4.0
Component/s: MLlib
Labels:
None

Description

Improvement: test-time computation

Currently, pairs of leaf nodes with the same parent can both output the same prediction. This happens since the splitting criterion (e.g., Gini) is not the same as prediction accuracy/MSE; the splitting criterion can sometimes be improved even when both children would still output the same prediction (e.g., based on the majority label for classification).

We could check the tree and reduce it if possible after training.

Note: This happens with scikit-learn as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2020-05-24-23-00-38-419.png
24/May/20 15:00
42 kB
xujiajin

Issue Links

causes

SPARK-34591 Pyspark undertakes pruning of decision trees and random forests outside the control of the user, leading to undesirable and unexpected outcomes that are challenging to diagnose and impossible to correct

In Progress

Is contained by

SPARK-14045 DecisionTree improvement umbrella

Resolved

is duplicated by

SPARK-23409 RandomForest/DecisionTree (syntactic) pruning of redundant subtrees

Resolved

links to

[Github] Pull Request #17503 (facaiy)

[Github] Pull Request #20632 (asolimando)

Activity

People

Assignee:: Alessandro Solimando

Reporter:: Joseph K. Bradley

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Aug/14 22:21

Updated:: 08/Jun/21 06:14

Resolved:: 03/Mar/18 00:41