Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20619

StringIndexer supports multiple ways of label ordering

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.3.0
    • ML
    • None

    Description

      StringIndexer maps labels to numbers according to the descending order of label frequency. Other types of ordering (e.g., alphabetical) may be needed in feature ETL. For example, the ordering will affect the result in one-hot encoding and RFormula. Propose to support other ordering methods and we add a parameter stringOrderType that supports the following four options:

      • 'freq_desc': descending order by label frequency (most frequent label assigned 0)
      • 'freq_asc': ascending order by label frequency (least frequent label assigned 0)
      • 'alphabet_desc': descending alphabetical order
      • 'alphabet_asc': ascending alphabetical order

      Attachments

        Issue Links

          Activity

            People

              actuaryzhang Wayne Zhang
              actuaryzhang Wayne Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: