Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20619

StringIndexer supports multiple ways of label ordering

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.3.0
    • Component/s: ML
    • Labels:
      None
    • Target Version/s:

      Description

      StringIndexer maps labels to numbers according to the descending order of label frequency. Other types of ordering (e.g., alphabetical) may be needed in feature ETL. For example, the ordering will affect the result in one-hot encoding and RFormula. Propose to support other ordering methods and we add a parameter stringOrderType that supports the following four options:

      • 'freq_desc': descending order by label frequency (most frequent label assigned 0)
      • 'freq_asc': ascending order by label frequency (least frequent label assigned 0)
      • 'alphabet_desc': descending alphabetical order
      • 'alphabet_asc': ascending alphabetical order

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                actuaryzhang Wayne Zhang
                Reporter:
                actuaryzhang Wayne Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: