Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7674

R-like stats for ML models

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Critical
    • Resolution: Done
    • None
    • None
    • ML
    • None

    Description

      This is an umbrella JIRA for supporting ML model summaries and statistics, following the example of R's summary() and plot() functions.

      Design doc

      From the design doc:

      R and its well-established packages provide extensive functionality for inspecting a model and its results. This inspection is critical to interpreting, debugging and improving models.

      R is arguably a gold standard for a statistics/ML library, so this doc largely attempts to imitate it. The challenge we face is supporting similar functionality, but on big (distributed) data. Data size makes both efficient computation and meaningful displays/summaries difficult.

      R model and result summaries generally take 2 forms:

      • summary(model): Display text with information about the model and results on data
      • plot(model): Display plots about the model and results

      We aim to provide both of these types of information. Visualization for the plottable results will not be supported in MLlib itself, but we can provide results in a form which can be plotted easily with other tools.

      Attachments

        Issue Links

          Activity

            People

              josephkb Joseph K. Bradley
              josephkb Joseph K. Bradley
              Joseph K. Bradley Joseph K. Bradley
              Votes:
              3 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: