Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14659

OneHotEncoder support drop first category alphabetically in the encoded vector

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • ML
    • None

    Description

      R formula drop the first category alphabetically when encode string/category feature. Spark RFormula use OneHotEncoder to encode string/category feature into vector, but only supporting "dropLast" by string/category frequencies. This will cause SparkR produce different models compared with native R.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            actuaryzhang Wayne Zhang
            yanboliang Yanbo Liang
            Yanbo Liang Yanbo Liang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment