[SPARK-14604] Modify design of ML model summaries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Several spark.ml models now have summaries containing evaluation metrics and training info:

LinearRegressionModel
LogisticRegressionModel
GeneralizedLinearRegressionModel

These summaries have unfortunately been added in an inconsistent way. I propose to reorganize them to have:

For each model, 1 summary (without training info) and 1 training summary (with info from training). The non-training summary can be produced for a new dataset via evaluate.
A summary should not store the model itself as a public field.
A summary should provide a transient reference to the dataset used to produce the summary.

This task will involve reorganizing the GLM summary (which lacks a training/non-training distinction) and deprecating the model method in the LinearRegressionSummary.

Attachments

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Apr/16 18:13

Updated:: 08/Oct/19 05:41

Resolved:: 08/Oct/19 05:41