Details
-
New Feature
-
Status: Resolved
-
Critical
-
Resolution: Done
-
None
-
None
-
None
Description
This is an umbrella JIRA for supporting ML model summaries and statistics, following the example of R's summary() and plot() functions.
From the design doc:
R and its well-established packages provide extensive functionality for inspecting a model and its results. This inspection is critical to interpreting, debugging and improving models.
R is arguably a gold standard for a statistics/ML library, so this doc largely attempts to imitate it. The challenge we face is supporting similar functionality, but on big (distributed) data. Data size makes both efficient computation and meaningful displays/summaries difficult.
R model and result summaries generally take 2 forms:
- summary(model): Display text with information about the model and results on data
- plot(model): Display plots about the model and results
We aim to provide both of these types of information. Visualization for the plottable results will not be supported in MLlib itself, but we can provide results in a form which can be plotted easily with other tools.
Attachments
Issue Links
- contains
-
SPARK-9112 Implement LogisticRegressionSummary similar to LinearRegressionSummary
- Resolved
- is related to
-
SPARK-6160 ChiSqSelector should keep test statistic info
- Resolved
- relates to
-
SPARK-9837 Provide R-like summary statistics for GLMs via iteratively reweighted least squares
- Resolved
-
SPARK-5133 Feature Importance for Random Forests
- Resolved
-
SPARK-11730 Feature Importance for GBT
- Resolved
-
SPARK-6001 K-Means clusterer should return the assignments of input points to clusters
- Resolved
- requires
-
SPARK-8538 LinearRegressionResults class for storing LR results on data
- Resolved
-
SPARK-8539 LinearRegressionSummary class for storing LR training stats
- Resolved