XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • MLlib

    Description

      Currently, spark.ml trees rely on spark.mllib implementations. There are two issues with this:

      1. Spark.ML's GBT TreeBoost algorithm requires storing additional information (the previous ensemble's prediction, for instance) inside the TreePoints (this is necessary to have loss-based splits for complex loss functions).
      2. The old impurity API only lets you use summary statistics up to the 2nd order. These are useless for several impurity measures and inadequate for others (e.g., absolute loss or huber loss). It needs some renovation.
      3. We should probably coalesce the ImpurityAggregator, ImpurityCalculator, and Impurity into a single class (and use virtual calls rather than case statements when toggling over impurity types).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vlad.feinberg Vladimir Feinberg
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: