Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7921

Change includeFirst to dropLast in OneHotEncoder

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.0
    • Component/s: ML
    • Labels:
      None
    • Target Version/s:

      Description

      Change includeFirst to dropLast and leave the default to true. There are couple benefits:

      a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
      b. keep the indices unmodified in the output vector. If we drop the first, all indices will be shifted by 1.
      c. If users use StringIndex, the last element is the least frequent one.

        Attachments

          Activity

            People

            • Assignee:
              mengxr Xiangrui Meng
              Reporter:
              mengxr Xiangrui Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: