[SPARK-14610] Remove superfluous split from random forest findSplitsForContinousFeature - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.0
Component/s: ML
Labels:
None

Description

Currently, the method findSplitsForContinuousFeature in random forest produces an unnecessary split. For example, if a continuous feature has unique values: (1, 2, 3), then the possible splits generated by this method are:

{1|2,3}
{1,2|3}
{1,2,3|}

The following unit test is quite clearly incorrect:

rf.scala

val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
      val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
      assert(splits.length === 3)

Attachments

Issue Links

links to

[Github] Pull Request #12374 (sethah)

Activity

People

Assignee:: Seth Hendrickson

Reporter:: Seth Hendrickson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Apr/16 22:00

Updated:: 11/Oct/16 00:04

Resolved:: 11/Oct/16 00:04