Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7921

Change includeFirst to dropLast in OneHotEncoder

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • ML
    • None

    Description

      Change includeFirst to dropLast and leave the default to true. There are couple benefits:

      a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
      b. keep the indices unmodified in the output vector. If we drop the first, all indices will be shifted by 1.
      c. If users use StringIndex, the last element is the least frequent one.

      Attachments

        Activity

          People

            mengxr Xiangrui Meng
            mengxr Xiangrui Meng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: