Description
Currently, the method findSplitsForContinuousFeature in random forest produces an unnecessary split. For example, if a continuous feature has unique values: (1, 2, 3), then the possible splits generated by this method are:
- {1|2,3}
- {1,2|3}
- {1,2,3|}
The following unit test is quite clearly incorrect:
rf.scala
val featureSamples = Array(1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3).map(_.toDouble)
val splits = RandomForest.findSplitsForContinuousFeature(featureSamples, fakeMetadata, 0)
assert(splits.length === 3)