Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3723

DecisionTree, RandomForest: Add more instrumentation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      Some simple instrumentation would help advanced users understand performance, and to check whether parameters (such as maxMemoryInMB) need to be tuned.

      Most important instrumentation (simple):

      • min, avg, max nodes per group
      • number of groups (passes over data)

      More advanced instrumentation:

      • For each tree (or averaged over trees), training set accuracy after training each level. This would be useful for visualizing learning behavior (to convince oneself that model selection was being done correctly).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: