Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32218

spark-ml must support one hot encoded output labels for classification

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.4.0
    • None
    • ML
    • None

    Description

      In any classification algorithm, for target labels that have no ordinal relationship, it is advised to one hot encode the target labels. Refer here:

      https://stackoverflow.com/questions/51384911/one-hot-encoding-of-output-labels/53291690#53291690

      https://www.linkedin.com/pulse/why-using-one-hot-encoding-classifier-training-adwin-jahn/

      spark-ml is not supporting the one hot encoded target labels. When I try, i get the below error:

      IllegalArgumentException: u'requirement failed: Column label_ohe must be of type numeric but was actually of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'

      So it will be nice if OHE is supported for target labels

      Attachments

        Activity

          People

            Unassigned Unassigned
            raghuvarranvh Raghuvarran V H
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: