Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10524

Decision tree binary classification with ordered categorical features: incorrect centroid

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.6.0
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: ML, MLlib
    • Labels:
      None

      Description

      In DecisionTree and RandomForest binary classification with ordered categorical features, we order categories' bins based on the hard prediction, but we should use the soft prediction.

      Here are the 2 places in mllib and ml:

      The PR which fixes this should include a unit test which isolates this issue, ideally by directly calling binsToBestSplit.

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              josephkb Joseph K. Bradley
              Shepherd:
              Joseph K. Bradley
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: