[SPARK-14604] Modify design of ML model summaries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Several spark.ml models now have summaries containing evaluation metrics and training info:

LinearRegressionModel
LogisticRegressionModel
GeneralizedLinearRegressionModel

These summaries have unfortunately been added in an inconsistent way. I propose to reorganize them to have:

For each model, 1 summary (without training info) and 1 training summary (with info from training). The non-training summary can be produced for a new dataset via evaluate.
A summary should not store the model itself as a public field.
A summary should provide a transient reference to the dataset used to produce the summary.

This task will involve reorganizing the GLM summary (which lacks a training/non-training distinction) and deprecating the model method in the LinearRegressionSummary.

Attachments

Sub-Tasks

1.	Update GeneralizedLinearRegressionSummary API	Resolved	Joseph K. Bradley
2.	Update LinearRegression, LogisticRegression summary APIs	Resolved	Joseph K. Bradley
3.	Update LinearRegression, LogisticRegression summary internals to handle model copy	Resolved	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Apr/16 18:13

Updated:: 08/Oct/19 05:41

Resolved:: 08/Oct/19 05:41