Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7126

For spark.ml Classifiers, automatically index labels if they are not yet indexed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.4.0
    • None
    • ML

    Description

      Now that we have StringIndexer, we could have spark.ml.classification.Classifier (the abstraction) automatically handle label indexing if the labels are not yet indexed.

      This would require a bit of design:

      • Should predict() output the original labels or the indices?
      • How should we notify users that the labels are being automatically indexed?
      • How should we provide that index to the users?
      • If multiple parts of a Pipeline automatically index labels, what do we need to do to make sure they are consistent?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: