[SPARK-7126] For spark.ml Classifiers, automatically index labels if they are not yet indexed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.4.0
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Now that we have StringIndexer, we could have spark.ml.classification.Classifier (the abstraction) automatically handle label indexing if the labels are not yet indexed.

This would require a bit of design:

Should predict() output the original labels or the indices?
How should we notify users that the labels are being automatically indexed?
How should we provide that index to the users?
If multiple parts of a Pipeline automatically index labels, what do we need to do to make sure they are consistent?

Attachments

Issue Links

is blocked by

SPARK-6113 Stabilize DecisionTree and ensembles APIs

Resolved

SPARK-6965 StringIndexer should convert input to Strings

Resolved

relates to

SPARK-11106 Should ML Models contains single models or Pipelines?

Resolved

SPARK-14862 Tree and ensemble classification: do not require label metadata

Resolved

supercedes

SPARK-2206 Automatically infer the number of classification classes in multiclass classification

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Apr/15 18:14

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33